Tuesday, April 17, 2012

Efficient Roulette Wheel Selection

I ran into a pretty nice way of doing roulette wheel selection the other day. It requires O(n) set up time, and each selection takes O(log(n)) time, for a total of O(n+nlog(n)) or O(nlog(n)) time for the whole selection process. This is a lot better than the completely naive way (generating the "spin" of the wheel, and then running through the individuals looking for the one that was "hit") which takes O(n^2), and which I've been guilty of in the past.

First off is the setup. Given a population of individuals, paired with their fitnesses. Lets assume that this is in some linear structure like a linked list or vector. Go through each individual, setting their fitness value to the total sum of all previous individuals plus their own fitness. The summing of the fitnesses takes time linear in the population size.

Next is doing the actual selection. This is amazingly easy- generate a value between 0 and the sum of all fitness values. Then do a binary search through the population with the new (summed) fitness values. Each selection takes log(n) time (to do the binary search) and you need to do it n times, so the second part of this selection technique takes nlog(n) time.

So there you have it- an easy and quick O(nlog(n)) time implementation of roulette wheel selection. I will probably end up using this soon enough.

Friday, April 13, 2012

Haskell Evolutionary Algorithm Library, take 2

I've been working on rewriting the program I did for my thesis- an evolutionary algorithm library in Haskell. It should be more general then just evolutionary algorithms, but I can't think of a better name involving Haskell and Machine Learning. This post is a progress report.

So far its going great. The structure of the library is a mixture of functional lenses (from Data-Lens), pipes (from Conduits), random variables/probability distributions (from RVar), and a little monad for keeping state and doing things randomly while allowing mutable references. I'm not completely sold up on the mutable references thing (ST monad) for this situation but I would like to play around with using it for all operators and seeing if the efficiency is worth the extra complexity and type variable in all my definitions.

The plan is to provide a couple of implementations of the usual genetic operators- a very general one that should work with different structure nicely and can be easily used, a very efficient one that can be used on fewer structures (probably vectors of unboxed types) but is hopefully insanely fast compared to what I've done before (see my previous posts), and one thats mostly for fun where everything is done entirely in terms of pipes. The last one would probably be slow, but it would be interesting to see a GA defined not in terms of populations and individuals, but completely in terms of pipes of elements (bits for example).

One of my major ease-of-use goals is that the user should be able to run a Genetic Algorithm simply by providing a fitness function. It should be easy to change parameters and all that, and slightly less easy to add operators or define a new algorithm, but for a quick run the fitness function should be it. I don't just mean a fitness function of a fixed type- I would like the generic GA to be so generic that you can give it a fitness function that expects lists of Bools and get lists of Bools, or if you want vectors of Ints, you get vectors of Ints, etc.

The current next step is finishing the implementation of point mutation and crossover, and then starting on some selections. I've notice that the pipe concept (and the nice implementation from Conduits) vastly simplifies this sort of algorithm. I really believe it is the Right Thing for the job.

Tuesday, April 10, 2012

Efficient Crossover and Selection for Genetic Algorithms

In the last post I discussed an efficient implementation of point mutation for a Genetic Algorithm. This time we look at crossover and selection (and rotation, but thats for PGEP and RGEP, not a GA). These operators take linear time each, and we do no better than that here. However, we can reduce the coefficient nicely, which is important in practice.

First off, lets look at crossover- especially 1 or 2 point crossover (or both). Crossover is not a terribly expensive operator- we simply need to generate a random value between 0 and 1 (to determine if we will cross a pair of individuals), and then a cross point between 0 and the length of the individual. The only somewhat expensive thing here is moving pieces of the individuals around, so lets not do this yet. Instead, we just make a list/array/vector of the ranges each individual occupies. One way to do this would be to consider the population a single large vector (or a 2 dimensional vector) and each individual a list of ranges from their start to end. After 1 point crossover an individual is a 2 element list of pairs- start to cross point, other side of crosspoint to end. In other words, we record what bits in the population make up each individual without actually changing the population at all. If we do 2 point crossover, or multiple crossovers in a single generation, these ranges may consist of more pairs of locations, but are still basically the same. We don't want the individuals to become too fragmented like this, as with completely fragmented individual this would be less efficient then the usual implementation, but for a single generation fragmentation shouldn't be too bad.

Selection can be similarly performed without changing the population. We just record the winner of each tournament or the result of spins of the roulette wheel (or whatever selection one desires). One way to do this is an initially all 0 vector, counting the number of occurrences of each individual in the new population. We add one to a location if the individual corresponding to that location is selected.

While this post is mainly about Genetic Algorithms, I want to mention that the rotation operator from PGEP and RGEP can be performed in a similar fashion. We either record the number of locations to rotate for each individual (with 0 for individuals that aren't rotated that generation) or modify the ranges we created during crossover described above.

So, we have an untouched population, a vector of ranges for each individual that describes what parts of the population they occupy, and a count saying how many times they occur in the next generation. It might be nice to be able to do the operators this way several times before touching the population (except mutation, which can be done independently however we desire), but since we need the full individual for fitness evaluation anyway, we might as well perform the crossover and selection we recorded at the end of each generation.

Now all thats left is to copy each individual from the old generation to the new one in one single pass over the population. This involves inspecting each individual and copying them the number of times they were selected into successive places in the new population (I'm thinking a double buffering style scheme where you copy back and forth between two equal sized populations). Each individual consists of a series of ranges, so getting the individual to the new population means copying each range one after the next to successive locations. An individual [(0, 10), (103, 107), (23, 26)] would copy the values between the locations (assuming the population is considered one big vector) 0 and 10, 103 and 107, and 23 to 26 to a single vector from, say, 0 to 20.

The advantage here is that while each copy is somewhat complex, we end up doing a small amount of work to determine what actions we need to take are (recording how to do each operator instead of doing it) which is sparse data about rearranging the individuals genetic material, and then actually copying each location only once per generation. I'm hoping to try this out when I get around to actually implementing this evolutionary algorithm library, although I will provide the simpler implementations as well.

Monday, April 9, 2012

Point Mutation with Huffman Encoding and the Geometric Distribution

The usual form of point mutation in a Genetic Algorithm is a pretty inefficient affair- the most straightforward approach is to check each location (usually a bit) in each individual, and mutate it with a given probability. This means generating a huge number of random values between 0 and 1. I've noticed that this gives a performance hit in my own GA experiments, and I would like to give a general way of doing mutation without losing randomness or generating so many random values. I've talked about this before, but I think this is a much better technique then my other suggestions. It is efficient, general (doesn't rely on any particular structure except vector individuals that contains something to mutate), and easyish.

The first thing to notice about mutation is that it is somewhat like a Poisson process, except in discrete time. As we walk down an individual (or the whole population if we want to think of it as a huge vector), we will mutate some locations and leave others alone. Each location is like a time tick, and whether or not we mutate is like an event occurring during that time it or not. So, how many locations do we skip between each mutation? And, more to the point, can we skip these without having to check to any work with them at all? We can assume that for the most part the probability of mutation is very low, so skipping over all the locations that won't mutate would be a huge gain if we could do it in less work then just checking each one.

Recall that (one definition of) a geometric distribution gives the number of failures before a single success from a Bernoulli process. With a probability of success equal to the probability of mutation, this tells us, given some number n, the probability that we will skip n spaces before the next mutation. The cumulative distribution function is then the sum of 0 < i < n of the geometric distribution at each number from 0 to i. Now we need to use this to generate a single number telling us how many spaces to skip. There may be a distribution for this too, but I don't know what it is (I would be happy to hear about it if one exists).

This is where a post on A Neighborhood of Infinity called "Lossless decompression and the generation of random samples" comes in. This describes how, given the cumulative distribution function for a finite probability distribution we can create a structure for efficiently sampling the distribution. There may be a better way to do this, but the way I'm planning on using for now is to build a Huffman encoding for the set of elements of the distribution, where the code word for an element is a path down a binary tree label with probabilities that leads to that element.

In our case the distribution is infinite in the sense that it is possible for any number of failures to occur before the first success. However, the individual or population has only so many positions, so really it is only the probabilities 0 to n where n is the number of locations being looked at. Since we are cutting off the tail of this distribution, we should sum all other possible numbers from n to infinity and assign the action of doing no mutations this value.

Now that we have the tools, we simply calculate the cumulative distribution function for the geometric distribution with p = the probability of mutation (this code should do it "
fmap sum $ tail $ inits $ [((1-p)**x)*p | x <- [0..]]"). We build the Huffman encoding, which is very simple. Finally, we generate bits and use them as paths down the encoding tree. The element at the end is the number of bits to skip and gives the next location to mutate. This requires only enough bits to go down the tree, and may skip as much as the whole population.

It seems common for the probability of point mutation to be set so that about 1 or 2 mutations occur on each individual each generation. This means that that expectation of the distribution is around the size of the individual, so we will end up generating very very few numbers. We do need enough random bits to get to the end of the tree (to construct a path to a leaf), but it will be balanced so that common elements have shallow paths. This means that this would be a huge advantage over the naive implementation, possibly but 1 or 2 orders of magnitude, depending on individual size and point mutation probability.