Monday, April 9, 2012

Point Mutation with Huffman Encoding and the Geometric Distribution

The usual form of point mutation in a Genetic Algorithm is a pretty inefficient affair- the most straightforward approach is to check each location (usually a bit) in each individual, and mutate it with a given probability. This means generating a huge number of random values between 0 and 1. I've noticed that this gives a performance hit in my own GA experiments, and I would like to give a general way of doing mutation without losing randomness or generating so many random values. I've talked about this before, but I think this is a much better technique then my other suggestions. It is efficient, general (doesn't rely on any particular structure except vector individuals that contains something to mutate), and easyish.

The first thing to notice about mutation is that it is somewhat like a Poisson process, except in discrete time. As we walk down an individual (or the whole population if we want to think of it as a huge vector), we will mutate some locations and leave others alone. Each location is like a time tick, and whether or not we mutate is like an event occurring during that time it or not. So, how many locations do we skip between each mutation? And, more to the point, can we skip these without having to check to any work with them at all? We can assume that for the most part the probability of mutation is very low, so skipping over all the locations that won't mutate would be a huge gain if we could do it in less work then just checking each one.

Recall that (one definition of) a geometric distribution gives the number of failures before a single success from a Bernoulli process. With a probability of success equal to the probability of mutation, this tells us, given some number n, the probability that we will skip n spaces before the next mutation. The cumulative distribution function is then the sum of 0 < i < n of the geometric distribution at each number from 0 to i. Now we need to use this to generate a single number telling us how many spaces to skip. There may be a distribution for this too, but I don't know what it is (I would be happy to hear about it if one exists).

This is where a post on A Neighborhood of Infinity called "Lossless decompression and the generation of random samples" comes in. This describes how, given the cumulative distribution function for a finite probability distribution we can create a structure for efficiently sampling the distribution. There may be a better way to do this, but the way I'm planning on using for now is to build a Huffman encoding for the set of elements of the distribution, where the code word for an element is a path down a binary tree label with probabilities that leads to that element.

In our case the distribution is infinite in the sense that it is possible for any number of failures to occur before the first success. However, the individual or population has only so many positions, so really it is only the probabilities 0 to n where n is the number of locations being looked at. Since we are cutting off the tail of this distribution, we should sum all other possible numbers from n to infinity and assign the action of doing no mutations this value.

Now that we have the tools, we simply calculate the cumulative distribution function for the geometric distribution with p = the probability of mutation (this code should do it "
fmap sum $ tail $ inits $ [((1-p)**x)*p | x <- [0..]]"). We build the Huffman encoding, which is very simple. Finally, we generate bits and use them as paths down the encoding tree. The element at the end is the number of bits to skip and gives the next location to mutate. This requires only enough bits to go down the tree, and may skip as much as the whole population.

It seems common for the probability of point mutation to be set so that about 1 or 2 mutations occur on each individual each generation. This means that that expectation of the distribution is around the size of the individual, so we will end up generating very very few numbers. We do need enough random bits to get to the end of the tree (to construct a path to a leaf), but it will be balanced so that common elements have shallow paths. This means that this would be a huge advantage over the naive implementation, possibly but 1 or 2 orders of magnitude, depending on individual size and point mutation probability.

No comments:

Post a Comment