Wednesday, October 13, 2010

Preventitive vs Corrective: Ways of Dealing With Invalid Structures

In evolutionary algorithms, it is common to run into situations where an invalid solution is created through some manipulation of a valid solution, or by random generation. Creating invalid individuals is a problem because their fitness can't be directly evaluated- they must either be given the lowest possible fitness, or repair somehow. Often the EA is constrained to that the form of the encoding of the solution can only create valid solutions. This is what I am going to call a preventative method. There are many techniques that use preventative methods- GAs often do this naturally, though in some cases, such as optimal list ordering problems like the traveling salesman, there is a need to be clever with the encoding (tricky encoding). In GPs, the genetic operations are carefully chosen so that they can only produce valid individuals.

On the other hand, I know some repair operators have been used (I'm not as familiar with GP literature as I would like to be). These are what I am going to call corrective- they take an encoding and make it encode a valid solution if it does not already.

In GEP the original method used a preventative method (the head-tail method), and it was claimed that this saves a great deal of computational effort that is wasted in finding and fixing invalid individuals. It isn't mentioned in the original GEP paper, but the other problem with corrective methods is that they often have to make significant changes to the individual, such that sometimes it is hardly even the same individual and might as well have been created randomly (almost).

The additional complexity added to GEP from the preventative head-tail method of ensuring valid individuals soon was replace (in most papers) by a corrective method where if an operation (including random initialization) would result in an invalid individual, the operation is undone. I call it corrective because the operation must actually be performed and then undone (corrected). This opens up the individuals to encode a wider variety of trees. This is especially nice in PGEP, where it is easy to check if an individual is valid without making a whole tree. The problem with this method is that it requires additional processing to do an operation and check for correctness, and even worse- it requires knowledge of the eventual expression tree to be made available to the genetic operators. This is too hard in in PGEP, but is harder in BGEP or BPGEP.

Other methods have been proposed, such as Unconstrained Gene Expression Programming (UGEP). This allows any change at any place, but has a lot of weird and complex processing to be done, including multiple fitness evaluations. The translation process ends up with 0 or more expression trees, all of which need to be evaluated. UGEP also used karva notation, which is particularity unfortunate considering how the expression trees are built (there is a lose of locality of subexpressions). I think it is actually a very clever idea, though maybe the expressions should have just been linked by some linking operator like the multi-genetic chromosomes in GEP. This may create many very wide trees though, and maybe it is better the way they do it in the UGEP paper.

Naive Gene Expression is another clever encoding, but I'm not convinced that it is worth the restrictions it imposes on the problem. It also uses karva notation, which I don't like nearly as much as prefix notation. It creates a complete binary tree (as operators in NGEP must have < 3 arguments) and fills out the tree karva style. This means that it always fills out the tree, but some parts of the tree are not used. It is an interesting idea, and essentially uses the same head-tail method of GEP to impose validity, but ultimately there is little gain for such a loss in generality.

I'm going to be proposing another corrective method that I think is very nice. It allows all operations to be performed, requires no knowledge of expression trees in the operators, is simple and fast, and preserves substructures nicely. The last part is particularly interesting, as I mentioned before that some attempts to repair are very disruptive. This method only repairs when expressing an individual, and never has to actually edit the individual itself- allowing any genetic variation at all to exist. It does a simple single pass through the linear encoding, removing some operators, and "floating" subexpressions up through the tree until the tree is complete. It has a nice biological justification in RNA editing, and allows some amount of neutral mutation by expanding non-encoding regions to inside the gene or at the end (one or the other).

I've never seen a comparison of the different validity enforcing methods, though probably there exists one for GPs. I hope that this will be one of the contributions that my masters thesis makes to GEP research. In another post I will describe the actual mechanism that I am proposing.

There are other things I would like to add to GEP. My thesis so far is about making it as simple and expressive as possible, but I would love to see some of the great stuff the GP community does attempted in GEP. Some of these things are recursion, iteration, strongly typed expressions (removing the "closure" condition in GEP), and the use of lambda calculus. Since one of my main interests is in the issues surrounding functional languages, and PGEP has already given us a way to evolve functional expressions (compare PGEP to most linear GPs, or even to karva notation), it would be great to carry the torch further and make the expressions even more closely related to functional languages.

No comments:

Post a Comment