Monday, April 18, 2011

Learning Genetic Algorithms, editing

Editing of a symbol list takes valid individuals (individuals whose genetic material encodes a tree) to themselves, so it makes no changes, and invalid individuals to an individual that encodes a tree, hopefully the "closest" such individual. We can add extra symbols, delete symbols, or change symbols, or any combination of those three. In Binary Genetic Programming, the original Genotype Phenotype Mapping system, the editing stage was very complex. In the systems that followed that line of research the editing only got more complex as they were intended to investigate the idea of developing an individual using the genetic material as a template. The editing that I will explain is must easier and is novel in the field (not that it is all that clever, but it works well). The downside is that it only works on the more algebraic expressions that GEP evolves rather then the more general ones in other Genotype Phenotype Mapping systems. On the other hand, its very nice compared to other techniques in the GEP literature, and it suggests a really cool addition to GEP systems that I hope to get into at some point involving stack operators.

Okay, so lets get to editing! Starting with the expression +1-2*3 lets see if we can create an expression that is pretty close but valid. The tree it expresses to is:


+
/ \
1 -
/ \
2 *
/ \
3 _
Where the _ symbol is an argument that is missing for *. Notice how we filled out the tree btw, depth first. This is prefix notation. A very close valid individual would be:

+
/ \
1 -
/ \
2 3
We could do this by "floating" the 3 up to the -, but it is easier to do this if we don't actually create the tree. So! Postfix notation!

Starting with an empty stack, written [], and the expression +1-2*3, lets evaluate the expression in the usual postfix way. First lets reverse the expression, 3*2-1+, and evaluate the expression symbol by symbol. Any terminal is simply pushing onto the stack, so first 3 is pushed- the expression is *2-1+ and the stack is [3]. The * will try to pop enough arguments off the stack (2) and multiply them, pushing the result on the stack. If the stack had been [3, 2] and the next symbol * the stack would become [6] or [(2*3)] assuming the top of the stack is the rightmost item in the list. Since there is not enough arguments on the stack when the stack was [3] the * is ignored. Then the expression is 2-1+ and the stack is [3]. Then we get -1+ [3, 2], then 1+ and [(2-3)] or [-1] then + and [-1, 1] then an empty expression and the stack [0].

Notice that the only thing that differs from this evaluation and the prefix one is that the * is ignored, which was the goal we set out to perform. This mechanism will always remove operators that do not have enough arguments, and it will make very few operators and make no other changes to the expression. It can also be done while evaluating the individual, so we don't even need to create the expression tree at all. This is very cool.

One of the nicest things about this notation and editing is that the subexpression in the tree (the tree starting from any node and taking all of that nodes children and childrens children, etc) are close to each other in the symbol lists. This means that symbols close to each other in the genetic material are close in the expression tree, and symbols that are part of a subexpression are never distanced by symbols of a different subtree. This is not the case in the original notation, Karva notation, which fills out its trees breadth first. The postfix notation allows us to gain this locality property as well as remove operators that will be a problem in the expression tree. This could be called postfix evaluation ignoring operators that cause a stack underflow.

We have gotten to material that is introduced in my thesis in this post. Perhaps I will post about the other parts of RGEP, and maybe about the algorithm as a whole. The editing stage is the only really novel thing besides some details of the binary encoding and the meaning of the genetic operators. Everything else in RGEP from the literature on Genetic Algorithms, Genetic Programming, Genotype Phenotype Mapping Systems, and Gene Expression Programming.

No comments:

Post a Comment