Saturday, February 5, 2011

Robustness and Redundancy

RGEP (Robust Gene Expression Programming) is intended to be robust in the sense that it does not make many assumptions about the problem that it solves. This means that instead of evolving a program it evolves an expression whose result (possibly another expression) can be evaluated for fitness. Instead of forcing the user to create a maximizing fitness function, fitness can be maximized or minimized. The encoding is simple bit vectors, and any operator can make any change to an individual- there are no constraints to evolution.
I've recently discovered another applicable meaning to the name robust- the fact that many bit vectors encode the same expression tree. Since RGEP has this feature, and no other GEP seems to have it (it was part of GPM that was removed for GEP by the head-tail method and later a restriction on genetic operators), it makes robust the perfect description of my technique.
My reason for adding this redundancy in encoding was to make implementation of RGEP easier, but it seems to have a significant positive effect on performance. I started noticing that RGEP's initial populations were better than PGEP's in my experiments so I performed some extra tests to isolate the first population. I found that indeed RGEP performed better than PGEP on randomly generated populations. This can only be accounted for by the redundant encoding- the bit vector representation has no way of expressing itself yet, and the genetic operators are not given the chance to act.
This increase in performance is not small either- for one test function RGEP is 57% better, and for the other it is 91.5% better!
I am trying to find some other mechanism to account for this, but as far as I can tell the redundant, unrestricted encoding is a huge gain in the fitness of random individuals.

1 comment:

  1. Arg! There was another mechanism behind it! It turns out that because the variable, x, was the first thing in the terminals list, and the length of the list is not an even power of two, it gets represented more often then other terminals. I chose to deal with underrepresentation like this, and I'm sticking to it, but I was hoping that the effect would be much smaller. Since x is such an important terminal, it is causing a huge gain in average fitness for it to be more likely. I was able to completely isolate this fact, and I'm sure that it is the cause. This is still interesting, but I must be careful about dealing with this issue- it is an unfair advantage for RGEP when pitted against PGEP.

    ReplyDelete