Its Computer Science Time: April 2010

Monday, April 26, 2010

MGEP

So, I finally understand monads, both mathematically (though I'm not so strong on category theory) and as a language construct. Enlightenment finally came while reading the article
http://blog.sigfpe.com/2006/08/you-could-have-invented-monads-and.html

In my normal fashion, I wanted to link what I had just learned to some project or immediate use. What occurred to me was that there is that the concept may be helpful in certain GEP problems. In particular problems that do not exhibit the normal closure property. This would allow a wider range of problems to be solved by GEP. I'm yet to think of a situation in which this would be helpful, but it opens up a world of possibilities for operators as functions with a different return type and params.
Monadic Gene Expression Programming!

Friday, April 23, 2010

Inversion Only

I presented by capstone project on the application of GEP to combinatorial optimization problems today. It went well, and some people may have even understood what I was saying (understandably very few though). One professor that I like a great deal told me that it was my responsibility to present my findings in a paper so the GEP community is aware of the problems I found, the changes I made, and my conclusions about the future applications of this method.
While I would like to do this, I'm not really sure how this would work. I would have to complete the statistical analysis so I could make a strong claim, and do some more analysis of these problems.
I would certainly like to have some papers to my name, but I'm not sure anyone is really interested. It looks to me like I'm the only person thats ever taken this modification of GEP seriously. In fact, I've never heard of anyone using anything but the original and new variations of it, never any of the variations mentioned in the original book.

I guess I'll just see how this plays out.

Wednesday, April 21, 2010

Satisfiability is Hard

I'm been looking into applying PGEP to SAT problems (3-SAT in particular) but my results so far are not good at all. Most individuals end up encoding for only one variable or none to be true, and yet the problem ends up satisfying almost every single clause. The unsatisfied ones are hard to correct for, even with SAW. I think this is because the subexpressions that should be combined to make individuals that satisfy all clauses aren't being created, as the important genetic material (the stuff that is involved in fitness) is just one terminal in the beginning of the chromosome. I'm not sure how to get more complex trees and richer structures out of the population.
It is possible that the easiness of the problem is encouraging individuals that aren't complex, but I'm seeing a similar problem in what should be a much more difficult problem.
Looks like its back to the SATLIB site to find something difficult to see if I can get some real diversity of structure going.

Sunday, April 18, 2010

Evolutionary Dynamics

Ferriera has shown that the dynamics of GEP are different from that of a GA in her book, Gene Expression Programming- Mathematical Modeling by an Artificial Intelligence. The best fitness predictably is pretty consistent (as elitism is in use) but the average fitness skips around quite a lot. It is pretty clear that this is because the operators defined by Ferreira are very disruptive, so that the algorithm explores more then exploits. It can produce huge variations in a single mutation, if the structure of the tree is changed enough. This is fine in a way- elitism ensures we have at least a pretty good solution, and selection will ensure it has a good presence in the population, and most of the operators will search a wide area of the solution space around this individual.
The problem is that there are so many operators, and they are so disruptive that we end up with little more than a random search (at worst). The problems that the technique is applied to in the book are mostly fairly simple, and I wonder if the randomness of the search is just ensuring that good solutions are found because of the smallish size of the solution space. For much larger solution spaces, that randomness may result in a lot of wasted effort (exploring when we should be exploiting what we know is good).
Other operators have simplified GEP since it was created (rotation and point mutation as the only mutations, and single and double cross as the only crosses) and PGEP seems to be better able to keep a subexpressions structure even after small changes to the individual. I'm wondering if the population dynamics have changed when applying this operations and variations of GEP. If I learn anything about this in the remaining part of my capstone I'll definitely post it here.

Wednesday, April 14, 2010

Biological Justification For GEP

GEP, and particularly BGEP, have a really nice biological justification for their existence. While relating evolutionary computation to biology is not strictly necessary (it is mainly an optimization technique, not a study of evolution), inspiration from evolution is a way to legitimize a technique.
With BGEP, there are only 2 symbols used, 1 and 0. This is more like DNA, where there are 4, then GEP, where there are many. As with DNA, groups of loci determine what they will create by their order and contents. The things they create have a more complex structure (mRNA for example) just as the expression trees have a more complex structure then the linear gene that creates it. The next step is for the structure to create something else that will effect the environment and be involved in the fitness of the genetic material. In GEP this is the result of evaluating the expression tree, in biology it is the proteins created and their effects.

I am not a biologist, and my understanding of genetics is very limited, but I'm glad to see that what BGEP, and hopefully BPGEP if it every exists, are a complex evolutionary system that have separate genotype and phenotypes, as well as a progression of effects from linear to complex structures that ultimately determine fitness. This has the potential to be a very effective system, and I'm pretty excited to study it this summer for my independent research project.

Friday, April 9, 2010

Bitwise Knowledge

I believe that there is an important and poorly investigated aspect of GEP that may be allow
it to outperform GAs in particular, as well as some other methods.
The main idea is that if the GEP is used for parametric optimization (as discussed by its creator) it creates a value out of more then just bits, but rather encodes knowledge about the value in the form of a tree. While this idea is well known, what it interesting is if this is applied on a bit level.
BPGEP will be a bit vector which will create an expression tree out of groups of bits, which in turn will be evaluated into possibly a third structure which will be evaluated for fitness. The relationship with biological genes will be discussed in a future post (its pretty cool). The third structure may be another bit vector, which may be longer or shorter than the first (depending on the problem). This means that we are then encoding knowledge about a bit vector and the relations between the bits. This has been a problem for GAs traditionally, as there effectiveness (in crossover particularly) is limited when the bits that are close in the individual are not closely tied in meaning, but the truly meaningful relations are hard to determine. You can do thing like inversion, but even so, a bit is limited in its ability to relate to other bits because it can only be next to so many at once. This is interesting especially in problems with high interrelations like cryptography, where each bit intentionally affects as many others as possible. This may make GEP ideal for breaking cryptographic methods, or at least more effective than GAs.
I really hope to investigate this, perhaps as part of my thesis on BPGEP.