Wednesday, May 2, 2012

Genetic Algorithm and RGEP Complexity

I've been looking at efficient implementations of genetic operators lately. Obviously the actual complexity depends on the fitness function (sometimes more then anything else), the choice of genetic operators, the implementation of those operators, and the representation of the individuals. However, I'm really interested in the ones used in RGEP, so point mutation, one and two point crossover, rotation, and tournament selection.

I'm slowly converging on a set of structures and algorithms that seem to give a complexity on the order of O(g*p*log(n)) with g the number of generations, p the population size, and n the size of the individuals. This also applies to RGEP, as it is pretty much just a genetic algorithm with a Rotation operator. A genetic algorithm would normally have O(g*p*n) instead, assuming the usual genetic operators, the usual array of arrays representation, and a reasonable implementation of the genetic operators.

To get this time complexity, I plan on using a Vector (the Haskell library Data.Vector) for the population (assuming the usual multiset population) and a Sequence (the Haskell library Data.Sequence, which implements 2-3 finger trees) for the individuals. These are chosen because Vectors have efficient indexing and a nice function "backpermute" which can be used to select the individuals for a new population (given their indices from a selection operator) in O(n) time. Sequences have an amortized O(min(log(i, n-i))) split operator where i is the index to split on and n is the size of the sequence. They also have a O(min(n,m)) concatenation operation (n and m are the sizes of the two sequences). This is important as all of point mutation, crossover, and rotation are really not about copying, but about splitting and concatenating, and should really take less then linear time.

I have learned that roulette wheel selection can be done in O(n) time, sampling n times from a population where each sample can take O(1) time using either the algorithm in "Roulette-Wheel Selection via Stochastic Acceptance", or in my case maybe the one in "A Linear Algorithm for Generating Random Numbers with a Given Distribution". Even using Tournament Selection, which is my personal preference, we need only linear time in the population size to do selection.

I've talked before about point mutation by sampling from a geometric distribution, and given that we need only set up the distribution once and each sample will take O(1) time depends only on the number of mutations in the population. This is usually chosen so that each individual is mutated only a couple of times, so it occurs with about the same frequency regardless of the individuals size (and therefore shouldn't depend on that size). It is unfortunate that this is the one place where a vector would be better than a sequence (assuming it is a mutable vector), but I'm not terribly worried about this. If necessary, all the points can be either mutated in a single pass down the whole population, only changing those points that were determined to be mutation points, or by a series of applications of the function "adjust", which is where I'm getting what I'm going to call p*logn complexity because the number of mutations is determined by the size of the population and the complexity of splitting and rejoining an individual at a given point.

Given these observations, the time complexity is O(g * (mut + cross + rot + select)) where the complex of mut, cross, rot, and select are plogn, plogn, plogn, and n respectively. This makes the total complexity only O(g*p*logn).

This may not be the best we can get (I'm not sure, but I'm looking in to it) but its pretty dang good compared to my usually naive implementations.

3 comments:

  1. It IS related to computer! Exactly! You really nailed it, randomly generated spam!

    ReplyDelete
  2. Perhaps it was generated by a genetic algorithm, lol. Seriously, good series of blog posts, looking forward to more.

    ReplyDelete
  3. Do you think that it is important that this blog is related to computer?

    ReplyDelete