Wednesday, September 29, 2010

Symbolic Regression in Haskell

I am testing my Haskell GEP library with a simple symbolic regression problem from the paper "Investigation of Constant Creation Techniques in the Context of Gene Expression Programming". I like the idea that the individuals evaluate to something that is then mapped to a fitness, even though my first thought (the way I did it in the Java library for GEP I wrote last summer using the Functional Java) was to have the expression tree of the individual be the expression they encode. The real difference here is how, when you encounter a variable like 'x', how do you give it a value? Do you evaluate the expression tree with some extra global state (kidding!), pass down an environment (what would it looks like?), or something else? In the Java program I passed info to the evaluator telling it to evaluate the individual with some value for x. This was messy and required each use of the function to go through the whole evaluation of the expression tree.

My solution was to actually create a function of type Double->Double, which is the type of thing we are optimizing in this case (unary functions of real numbers, f(x)=something). This means we must construct some function that, given a value x, calculates the function the individual encodes, and we should do it incrementally so it can be built by a depth first evaluation of the expression tree.

I want to note that the other way to do this is to create an instance of State Double, or more generally State Env where Env is some environment necessary for the computation.

So, how do we make such a thing? We must be able to take our terminals (constants and the variable x in this case) and combine them with our operators (+, -, *, / in this case).

First the definition of an operator
data OP a = OP { arity::Int, applyOp::[a] -> a, name::String }
Note that operators know their name, and have a function that takes a list of arguments and evaluates to something.
Then of a variable
var = OP { arity=0, applyOp=const id, name="x" }
A function that turns constants into operators
constant d = OP { arity=0, applyOp=const (const d), name=show d }
and a function that turns a function Double->Double into an operator
function fn nam = OP { arity=2, applyOp=(\(f:g:xs) -> (\a -> f a `fn` g a)), name=nam }

The trick here is that a variable x is just the value of x being passed in as a parameter, therefore x=id. Constants take in the value for x, but have no use for it, therefore a constant must take something in but always return the same thing- const d where d is some constant.
Binary functions, the only type I'm defining right now, must apply themselves to the result of evaluating their arguments, and their arguments must know the value of x to evaluate, so we take the functions, and return a function that, given an x, evaluates the functions and applies the operator to the results.

This is really very simple, I am just so relieved that I can actually create such a thing easily- it was really frustrating in Java not having higher order functions (despite using the Functional Java library, which tries hard, but is nothing like a real functional language for ease of use).

Now I have to find the right construct for implementing the evolutionary algorithm. This has presented several problems, the solution to one may be a monad transformer, but I'm not sure.

Incidentally the other problem I'm having is that it seems like the validity checker is rejecting all variation (mutation, rotation, etc) so no new individuals are ever created after the randomly initialized population is created. This is not cool, and I'm not sure whats going on- the checker seems to work, the operators work by themselves (when not reversed by the checker) and yet I've never seen it create new genetic material, even through recombination.

2 comments:

  1. Wow! I'm glad I posted this- my last little lament about the validity checker made me think about why it would *never* create new individuals (mutation should occur in the non-coding region sometimes, which would leave the individual encoding the same expression tree, and therefore valid). It turns out I just mixed up the old and new population, and the old individuals were, by definition, valid, so they were always added the the new population. Now the new individuals are checked, and if valid added, and if not the corresponding old individual takes its place. This is one of the reasons I want to get rid of this mechanism altogether- so I don't have to implement it.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete