Saturday, August 28, 2010

Hierarchical Gene Expression Programming (HGEP)

I've been meaning to write this idea down for a while, because it is potentially interesting, and if nothing else it is yet another variation on GEP- of which there are already so many.
While it is nice that GEP is so flexible to allow many possible changes, many of them are either very complex, IGEP, or lost generality, NGEP. On the other hand, some remove complexity, like PGEP-which I think all GEP researchers should be using instead of GEP- or increase complexity in a very reasonable way, like BGEP. I'm hoping that HGEP is a reasonable increase in complexity, and I think of it as having potential benefits in domain specific ways, unlike PGEP which seems pretty generally to be an advantage over GEP.


So- HGEP. Or more like HPGEP, as I prefer prefix notation over karva. It is a sort of synthesis between the multi-genetic chromosomes of GEP(with a single linking function) and the single gene chromosomes of PGEP. It exploits the homomorphism between the fact that terminals are linked by operators, and that genes can be linked by operators. So why have only one linking function? Why not have each gene be a "terminal" in a second order GEP, where the operators are the same, but each terminal is an entire gene? While each HGEP individual would have an equivalent GEP individual, the GEP one would be very long and easy to disrupt. HGEP would impose the idea of hierarchy on the individual, and allow genes to combine in interesting ways, rather than guessing which linking function might be useful and imposing it over all genes.
This may introduce to much complexity to the individuals, especially because they would have variable numbers of genes. To fix the number of genes we could just have a GA style list of operators included in the front of the chromosome, and use it to link the genes. This would allow variation in linking functions without imposing so much complexity.


I will probably flesh this idea out a little more in another post. It is not so well developed at the moment.

Research Paper Nearly Turninable

I'm almost proud of how this research paper is turning out. I've had many problems with the code and I always seem to have difficulty presenting my findings, but the ideas seem good, there is some solid justification of my work, and even a sense of real extension on the work of others (BGEP in particular).
I would really really really like this to be in a language other then Java, as I've posted about before, but my lack of experience in Haskell is crippling my efforts to rewrite it. I'm starting small, trying a simple GA in Haskell first, and hopefully I will one day be able to extend it enough to include some of the variations on GEP like BGEP, PGEP, BPGEP, UGEP, IGEP, AGEP, NGEP, GEP-NN, GEP with DE, etc. Mostly though I'm interested in PGEP, as I really don't like the idea of head-tail separation and genetic operators that know if a locus is a terminal or an operator.
Well, enough rambling-back to work!

Sunday, August 8, 2010

I had an Idea!

BPGEP is a very cool method- all the (relative) simplicity of PGEP, with the low arity representation of BGEP. It is by far the most fun I've had with randomly generated bit vectors, thats for sure.
So why not apply this simply representation, along with its nice, substructure preserving prefix notation, to other evolutionary techniques? Bit vectors are hardly uncommon, and the translation and interpretation stuff from bits to an expression tree is not very hard. Maybe I could do a binary particle swarm, or maybe even mix it up and do a generation of one technique, then another, and maybe some hill climbing in between.
The fact that we can go from bits to trees in such a nice way is very interesting to me, and I bet there is some cool stuff to be done with it. Operators that preserve substructure, combinations of evolutionary methods, the simplest possible example of random mutation hill climbing. Good stuff.

Sunday, August 1, 2010

Things aren't so Simple!

Unfortunately I'm having problems getting results with BPGEP. It seems to find functions alright, but its terribly slow. I had to scale the parameters down enormously to get it to a reasonable time (It would have taken days if not weeks to do a single run with the parameters in the paper "Investigation of Constant Creation Techniques in the Context of Gene Expression Programming"). I'm okay with using different parameters because the point of the paper is to investigate BPGEP, not solve problems.
I would like to use the more complex function in the paper, the "V" shaped one, because it may be solved more easily when I add lots of constants to the terminal set, which I plan on doing.
Incidentally- why are all the fitness functions in symbolic regression papers so strange? The one in the original GEP research doesn't seem to force the fitness to be positive (unless I'm missing something) and the one in the paper mentioned above doesn't seem to make sense when the individual fitness is equal to the best found so far. The highest possible fitness would be .5, and if the residual is 0, the fitness would be 0. Its like I'm not interpreting it correctly.