Saturday, April 9, 2011

Visualization of Diversity in a Cellular Genetic Algorithm

A Cellular Genetic Algorithm (CGA) is a regular GA with additional structure on the population. While the normal populations in a GA are a multi-set (a set that can have repeated elements) in a CGA they are some other data structure. Often this is either a list or a torus (a grid whose edges meet). This imposes a restriction on crossover where individuals can only cross with others near them. It also affects selection, where an individual can only be replaced by individuals around them.

On of the neat things about CGAs is that they can have interesting behavior with diversity. It is possible for some sections of the structure (lets consider a grid, as that will be interesting later on in this post) to converge to different solutions then other areas. This means that in the center of some collection of cells (holding individuals) there will be very little change, as mutations will be weed out (mostly) and crossover will not affect equal individuals. On the other hand, at the edges between to converged groups, selection will cause one group to dominate other other, or a stable edge may appear. Crossover across this edge will cross the traits that make each group highly fit, and may produce very very fit individuals. This is the coolest thing about this technique. This may also occur in an Island Genetic Algorithm, incidentally.

The other thing I want to introduce before getting into the point of this post is the solution space. This really is a space, as for any two points (encoding of solutions) a distance measure may be taken. If the encoding is simple bit vectors, then the distance is the hamming distance (the number of bits that differ between the individuals). This distance does not necessarily correspond to the difference between their fitnesses. This gives another dimension to the solution space, which we may imagine as a landscape where the solutions are x, y coordinates and the z value is the fitness of those points. The landscape may be very complex.

Another measure we can make is the "diversity" of the individual. We can take the pair-wise hamming distance between the individual and the individuals "near" to it. In a multi-set population an individual is "near" all individuals in the population's topology. In a CGA this is not the case, and diversity is measured only by the individuals in cells of a certain distance away in the topology (lets say distance 1, so a cell in a grid has 8 neighbors).

The idea of this post is that we can equate the diversity measure of the individuals and an individual's fitness. This means that we are almost certain to have many different groups converging to different bit vectors. Then we can imagine the grid of fitness values of the individuals, which will tell us not their genetic material but just how much they differ from their neighbors. This is were the visualization comes in.

If we assign some range of colors to the values from 0 to the maximum diversity measure (neighbors*individual length), probably with blue or white around 0 and red around the highest diversity. Then we could make images out of the population's fitness values throughout the run of the CGA and see the interaction at the edges of the groups. Probably the groups will quickly form and one group will take over the others, but really I don't know what would happen. I do know it would look really neat.

No comments:

Post a Comment