Friday, March 2, 2012

Finding 1s in a Machine Word

I stumbled the other day on a pretty neat paper called "Using de Bruijn Sequences to Index a 1 in a Computer Word" about a very cool algorithm for finding the index of the least 1 in a bit string. I wanted to share it here because it feels elegant to me. Note that finding the index of the first one can be repeated to find the index of all ones, and to find the number of 1s.

First, some background- there is a surprisingly difficult problem called Population Count, which the the problem of finding the number of 1s in a bit string (usually a single machine word). This is not difficult in the sense of finding an algorithm for computing it (shifting and masking the for each bit would work) but rather in finding an solution better than linear in the size of the machine word. This is sometimes provided in hardware, but not all instruction sets have it, and it may not be exposed to the programmer even if it exists (outside of, say, inline assembly in C). This may seem like a strange thing to want, but it (and the problem of finding the index of 1s) comes up in a variety of situations. One example would be if you were keeping a bit-map of blocks in memory, and you wanted to know how many bits are set in a single word (this came up at work the other day). In this case you may also want to know the first bit set, say to find the first bad block when managing flash storage. Another is in the implementation of an efficient immutable trie, which is very nice and useful data structure. In this case, the first bit set to 1 would indicate the first symbol that a branch has a child for.

The algorithm takes three steps- first isolate the first 1, than hash it to a unique key, than finally index into a lookup table for solutions. This lookup table is very small it turns out- one index per bit in the word. For a 32 bit machine, this only has to take 32 bytes, which is basically nothing and can be (and is assumed to be) pre-calculated very easily. Lets take it one step at a time.

First, isolating the first one. This algorithm assumes there is at least one 1, which is easy to check beforehand. What we want to do is set all 1s to 0s except the least significant one. This is very easy, and is given as "x & -x" for a machine word x, with "&" the bit-wise and operator and "-" twos complement inverse. Notice that the twos complement inverse inverts all bits except the first one, so anding it with the original value makes that bit the only bit still set. The example they give is 01101000, with inverse 10010111, add one to get the twos complement 10011000, and them together 01101000 & 10011000 to get 00001000. Nice.

The next step is to produce a has of the resulting number. The has must be unique, so there are no collisions given values that are powers of 2, so that each possible first bit position with a one produces a unique key to index the lookup table with. This is where de Bruijn sequences come in. These sequences are just bit strings that, for some number n (a power of 2), contain exactly one instance of all log2(n) bit strings, if you allow yourself to wrap around. For example, for n=8, the sequence 00011101 contains all 3 bit (log2(8)=3) sequences, although to get 100, for example, you need to start at the rightmost bit and wrap around. They chose a sequence that starts with 000, for reasons we will get to soon. Now, for this fixed sequence, the hash function is given as: hash(x) = (x*s) >> (n-log2(n)) where s is the chosen de Bruijn sequence.

Now the cool part of the paper- this function must necessarily produce a unique key! To see this, we need to make a couple of observations. First off, since the input number, x, if a power of 2, multiplying by x is the same as a left shift. We do this multiplication modulo 2^n, so we ignore any bits after the nth. This means that we have lopped off some of the higher order bits of our de Bruijn sequence. Next, notice that the right shift operator will shift out all but the bits necessary to index into our lookup table. This means that we have lopped off some of the lower order bits now, leaving us with a subsection of the original sequence. At first I didn't see how we would get the bit sequences where we would have wrapped around, as the shifts that we are doing are not wrap-around shifts, but that is why we specified that the sequence should start with 0s! If it starts with zeros, and the multiplication by x adds zeros to the right, then get those sequences "for free" in the sense that they are still possible even without wrapping.

The next step is pretty obvious- we have a unique index, so just look up the answer in the table, and we are done.

This algorithm may not be the best in all situations, and they do some performance tests on it and discuss extensions and limitations in the paper, but its is pretty neat, I think. I like how we can derive a hash function with no collisions for this input, and how it cleverly ensures all indices are possible.

Incidentally, I found this paper while looking for a reference on de Bruijn indices, which are also very nice, but are completely different.

No comments:

Post a Comment