I have just read though the paper "Gene Expression Programming and Rule Induction
for Domain Knowledge Discovery and Management" and found out that the method for using the whole gene that is used in PGEP is really just not allowing invalid chromosomes to exist. How exactly this happens is not laid out clearly in the paper, but I think that he just disallows operations that detect that they will create invalid individuals (or throws out individuals that are problematic).
I think we can do better then that. For one thing, invalid individuals goes against the spirit of the original GEP paper, which makes a point of how no time is wasted on checking validity of genes. I think that there are several ways to keep this nice property.
1. I would be easy to identify which operators at the end of the gene are causing problems, and these could simply be ignored in fitness evaluation.
2. In BGEP, with its binary representation, the first bit of a symbol determines whether it is a operator or terminal. This means we could just flip a couple of bits in the operators that were causing problems, and end up with a gene that has the same genetic material but is valid.
There may even be other ways to do this. I can't think of any right now, and I don't think there is too much need for any thing else, as these two are fairly good and clean.
I plan on developing a variation of GEP as a research project this summer, so this will definitely be covered, along with solutions to other problems in BGEP and GEP in general.
I have indeed come up with a mechanism similar to the first one here, but actually much nicer. I don't have a name yet, but I'm considering calling it something related to RNA editing, as that is the biological justification for its operation.
ReplyDelete