Its Computer Science Time: Possible Improvement on the Last Post's Algorithm

This post describes what may be an improvement of the efficient way to do the canonical Genetic Algorithm described in the last post. I'm not sure how well it would work in practice, but for a mostly converged population it could be a huge help. It can also help avoid recalculating fitness for two individual that happen to be the same. This will involve hash functions, and the downside is that it may result in some crossovers, selections, and fitness evaluations not being performed correctly when distinct sections of individuals hash to the same value.

Recall that in the last post we made use of a complete 2-tree representation of our individuals. What we will add here is one single piece of data to each internal node of the tree.

Assume that we can hash the contents of an individual. This is not much of an assumption given the usual fare for a GA- natural numbers, integers, bits, doubles, floats, etc. We calculate a hash of each location, and, moving up the tree, combine the hashes of the two leaves, fingertree style. Each genetic operator will now be slightly modified from the description in the last post by this change.

Mutation is the same, except for the unfortunate fact that we have to propagate changes to a location up the tree to the top. This makes each mutation take logarithmic time.

Crossover is where we start to see some advantages. When traversing down two individual's trees, if we reach a node where hashes agreed, we stop doing crossover. We assume that if the hashes agree than the subtrees are the same and don't require any copying. Note that there can be false positives here if two subtrees just happen to hash to the same value.

Selection is done similarly- if we are copying over a losing individual with a winning won as described in the last post, and we come to subtrees sharing the same hash, we again assume they are the same, and can stop the copying. Again, false positives can occur.

We could try to use this to avoid recalculating fitness for individuals by placing their hashes in a hash table with their fitness, although this is not a good idea in general. This is the one case where I would be very wary of collisions, especially as we may want to only save one result per location in the table to avoid having it grow too much (which is what happens in my experience if you try to hash every fitness evaluation). Again, not really recommended unless we really want to avoid doing evaluations and we can safely report the wrong fitness or gene expression sometimes (like an a certain game).

Unless I actually implement these algorithms, which I may, I think this is it for this idea. The 2-tree representation is something I've been thinking about for a long time, and I think it is really a good idea, despite the extra complexity and logarithmic access to individual locations (just use maps and folds like a regular person and it will be fine).

Its Computer Science Time

Saturday, June 30, 2012

Possible Improvement on the Last Post's Algorithm

No comments:

Post a Comment