Monday, December 20, 2010

Memoizing Functions

I am reading "Fun with Type Functions" by Simon Peyton Jones, Oleg Kiselyov and Chung-chieh Shan. It is an amazing paper- it is pleasant to read, interesting, and on a very cool topic. One thing they mention in passing is memoizing for referentially transparent languages, which in a way is very nice because you can be sure that no side effects could result and so you can safely memoize results. I was so stunned how awesome this is technique I literally sat there in awe for several minutes before reading on. This is an area of research I've been meaning to look at because it gets mentioned now and again, and now I have to get serious on looking this stuff up.
The idea of memoization is to turn a function into a data structure. You can often save time by saving the results of a function for future calls with the same input so you will not have recompute it. The way I was taught to do this, which was in Java since that is my university's teaching language, was to create a HashMap and, for each time a function is called, check if the parameters are in the map. If so, return the result, if not then compute the result, store it in the map, and return it. This is all well and good for simple little ACM programming competition problems (which is how I've used it) but in general it is kind of a design problem in Java, as we have no higher order functions. Python has its decorator syntax that allows you to wrap a function in a memoizing decorator, which makes the memoization transparent to the user. This is pretty cool, but what if functions can't keep some state that they modify when called? In Haskell, the situation is pretty amazing. Instead of creating the structure piece by piece as the function is called, you create the whole thing all at once, sort of. They have two functions- one that turns a function into a table storing its input, output pairs, and one that turns a table storing those pairs into a function. This means that if we go from function to table to function we get a function with the same type as the original, but that will memoize its results. With lazyness, the table can even be infinitely large, holding all possible input/output pairs for the function, and it will take a finite amount of memory and will fill itself out as its needed.
This is (to me) just stunning, and I don't think I have given the topic justice in this explanation. I may post about it more as I read about it, but for now, anyone interested should read the paper. The actual topic of the paper is type families in Haskell, which are also a very cool feature.

Sunday, December 19, 2010

Sudoku in Haskell

I'm currently writing a Sudoku solver in Haskell for fun. The board is a list (rows) of lists (squares) of lists (possible numbers for that square) and solving proceeds by eliminating possibilities as much as possible, finding the square with the fewest possibilities, and trying each one as if it was the only possibility for that square. I implemented this in Python a while ago, so I wanted to try it out in my new favorite language.
I've noticed that I get a lot of benefit out of Haskell for this type of program. The List monad is exactly what I need for nondeterministic selection of possibilities for a square, I used the derivative of the list type (aka a zipper) to move around in the board, I don't have to make a copy of the board before changing it (as there are no side effects), and laziness allows the first solution to be the one that is used without calculating the others (which will fail as there should only be one solution) without adding some way to exit the recursive solver when a solution is found. I might look into ways to do this kind of exit, such as with a continuation, but for now I don't even have to try at all- Haskell is nice enough to do this for me in a way.
Its not done yet, so back to work!

Monday, December 13, 2010

A Case for Imperative Reasoning

I've just finished watching a talk, the link is below, about operational semantics and the difficulty of specifying certain semantics in a purely declarative way. I really enjoyed the speaker, and I feel like I have a more complex view of the difference between imperative and declarative styles now. As someone who loves functional programming, it was interesting to see a reasonable case made about the challenges it faces in certain areas of specification. You have to watch a little bit into the talk to get the good parts, btw, as his opening statements about functional programming are not the interesting part.
The basic idea was that there are operational parts of an algorithm that may need to be specified for an implementation to be considered an instance of the algorithm (I'm taking a particular view of the notion of an algorithm, and this is discussed briefly in the talk) such as that a sort be in-place. I have no idea how we would state this directly in a functional language, except perhaps to build an functional description of an imperative process and some how ensure that it is carried out in a way the preserves the intended semantics.
The real reason I posted this is not just because it is interesting but because it is a rare thing to see an argument for imperative programming. I like this kind of thing because it adds a level of complexity and sophistication to the regular sort of arguments for or against one language/paradigm or another. I wonder though- he mentions that he uses intuitionistic logic in his current work, and as we all know the lambda calculus, and by extension Haskell, is an intuitionistic logic itself (through the Curry-Howard isomorphism) and programs are proofs of theorems in this logic. How this is related (if it indeed is related at all) I have no idea.

The link:

Sunday, December 12, 2010

Updates on Haskell Evolutionary Algorithms Library

My Haskell implementation of RGEP (Robust Gene Expression Programming) is going really well! Its called Heal, Haskell Evolutionary Algorithm Library, and its the latest iteration in my quest for a library to do my research with where I can change anything I want at any time to test new algorithms and genetic operators, etc.
I've decided to change everything to vectors, hoping to get some performance improvements. While I don't have any direct comparisons, I just ran a 100 individual population with individuals having 128 codons, and each codon of length 3. That is a largish dataset, and if I remember correctly my Java implementation took about 24 hours to do similar work (note that that was a pretty poor implementation on my part). My new library is about a third the size, supports PGEP, RGEP, and GAs, with an easy way to add more, and ran in 6 minutes (giving me a pretty poor answer to this symbolic regression problem I'm testing with, but thats not Haskell's fault). Wow! I also had the profiler running, and a full 87% of the time was spent garbage collecting, leaving only 40 seconds to run my code. It allocated a total of 21 Gigabytes during the entire run.
Now, I obviously must do something about this because that is an absurd amount of time in GC. The problem is I don't really know how to do deeper profiling in Haskell, and I'm having trouble with the most direct approach (running the executable with +RTS -p). So far the "shotgun" approach of adding strictness annotations is not working. Since every individual needs to be evaluated all the way every generation, laziness is only helping for some stuff, and I don't really need it for the main population, so I don't think throwing in bangs will hurt, but so far it hasn't helped at all. In fact, with only 8 Megabytes for the stack, the program stack overflows, so I gave it 15 and it runs fine.
In addition to being much faster, this library is much nicer looking (though I need to clean it up pretty badly) and I have delighted in using the various features of Haskell. I'm not even talking about the advanced features- the normal everyday ones are amazing.

On to profiling! I really want to see where all the time is being taken up. I feel like there is a lot of opportunity for improvement in time, space, and in just general style and organization.

Saturday, December 11, 2010

From #Haskell

I found these in the quotes of the week section of the Haskell new site:

data Neither a b = Left | Right
But in another sense, functional programmers are applied logicians who spend all their time proving trivial theorems in interesting ways in an inconsistent intuitionist logic.
unsafeCoerce is just a generalization of id
It's also the same thing as the Yoneda lemma. That's the thing about maths. Everything is actually the same.
programming languages can be divided into two broad classes: functional and dysfunctional
Schroedinger's cat is really in a thunk not a box
it's neat how you learn haskell because you are drawn in by the purely functional paradigm, and then you find loads more things like algebraic data types, monad abstractions, arrows and applicative, lack of objects... so that when people say "well, it's not haskell, but at least X is functional", it's just not the same at all
peyton `simon` jones
Here's the short guide to Haskell for OO programmers: Haskell isn't at all an OO language.
I'm starting to believe that learning haskell is mostly about carefully crafting small and clever functions and then finding out that they are already part of the standard library.
the first rule of fix club is "the first rule of fix club is "the first rule of fix club is...
Since NaN /= NaN, I think, we should decipher 'NaN' as 'Not a NaN'
Haskell is an even 'redder' pill than Lisp or Scheme

Haskell really is amazing. Laziness, type classes, monads, functors (and other category theoretic goodness), applicative functors, algebraic data types, pattern matching, currying, and a huge number of other things.

Thursday, December 9, 2010

Species and Functors and Types, Oh My!

I very much enjoyed the paper "Species and Functors and Types, Oh My!", which has a good talk by the author at:
There are other good papers, including "Species: making analytic functors practical for functional programming" and "AN INVITATION TO COMBINATORIAL SPECIES."
Awesome subject, great papers, good talk, good times.

Wednesday, December 8, 2010

Proof of Correctness for RGEP's Editing Mechanism

RGEP's editing stage consists of two parts- removing the non-coding region if one exists, and removing the extra operators, if any exist. I have convinced myself that the mechanism works, but I have not shown this rigorously until now.

To prove that RGEP's editing mechanism will always produce valid individuals, consider each stage in turn. The first stage removes the "non-coding" region. It is well known that expressions in polish notation can be parenthesized without ambiguity. If we add parenthesis and find symbols to the right of the right-most closing parenthesis then removing those symbols corresponds to removing the non-coding region. Note that we can account for expressions with too many operators by closing their parenthesis at the end of the expression.

In the second stage we can assume that all expressions are either complete or over-specified, as expressions with extra symbols at the end will have been removed in the first stage. The correctness of this stage is proven by cases:
Case 1- the expression is a valid prefix expression. In this case the postfix traversal of the reverse of the expression will be exactly the same as the prefix traversal of the expression, by definition.
Case 2- The individual has too many operators to be considered a valid prefix expression. In this case the postfix evaluation of the reverse of the individual will encounter either a list of operators, which would all cause a stack underflow and will therefore be ignored, leaving an empty stack as the result, or an individual with one or more terminal.
Operators before the first terminal will cause a stack underflow and will be ignored. Operators after the terminal may result in a change in the stack, but cannot empty the stack. At the end of the evaluation the top of the stack will be considered the result of the evaluation. Note that there must be only one thing on the stack in this case- if there were terminals that would end up on the stack then they would have been removed in the first editing stage.

We can also prove that RGEP's prefix-with-editing can encode all individuals that PGEP's prefix notation can, and that RGEP is a strict superset of Karva-with-head-tail-distinction. First, both PGEP and RGEP use prefix notation,
so any individual in PGEP, which is by definition valid, is therefore valid in RGEP. Assuming the correctness of RGEP's editing mechanism, RGEP has some redundancy in its encoding- some expressions will be reduced to valid prefix expressions which must be smaller then before editing (remember that all editing in RGEP deletes symbols), and must therefore be expressible as an prefix notation individual by padding it with symbols to meet the fixed length requirements.
To show that RGEP is a strict superset of Karva we first show that all Karva individuals have a prefix equivalent. A Karva individual encodes a tree, and as prefix notation can express any tree and requires no more symbols then Karva, we must be able to reorder the symbols in a valid Karva expression to get the same tree in prefix expression. To show that RGEP > Karva-with-head-tail we must only show a single case of a prefix expression that is valid in RGEP and not expressible in Karva. notation For individuals of length 7, with only the + operator and the terminal 1, the following individual is valid in RGEP- +1+1+11. The corresponding tree is too right unbalanced to express in Karva notation (by inspection) if we restrict the operators to only the first three places, as we must in this situation. Therefore RGEP > Karva in the sense that it can express more trees for symbol lists of fixed length. I'm sure this can be generalized to arbitrary lengths by constructing similar right-unbalanced trees, but the point is still valid- such a tree exists in at least one situation.

Monday, November 29, 2010

Haskell ABCs

I found some of these to be very amusing. Especially the ones for N, I, and Y.

Monday, November 22, 2010

RGEP- Robust Gene Expression Programming

My thesis is now title "Robust Gene Expression Programming- Evolving Treeless Expression Trees." I've renamed it from BPGEP because it is no longer just a union of those two techniques. My goal now is to come up with a simple and easy to implement Gene Expression system. Simplifying GP was kind of the point of GEP, but GEP is still pretty complex. GA is a simple EA and I'm striving for that level of unrestricted encoding and lack of assumptions about the problem space. I call it robust because it should make as few assumptions about the problem as it is possible for this kind of system to make.

To fulfill this goal I've chosen to keep the binary encode, with the change that sequences of bits that do not encode a symbol are thrown out in the gene expression phase, and the mutation, crossover and notation from PGEP. I've replaced roulette wheel selection with 2-tournament selection. This works by choosing 2 (or more generally k) random individuals from the population and having them compete in a tournament. The tournament consists of choosing the best individual with probability p, the second best with less probability, the third (if there is one) with a smalled probability, etc. This is basically a roulette wheel on a ranked subpopulation. I've changed the selection because there is no convincing reason to use roulette wheel (the original GEP paper chose it with very little justification), tournament selection is apparently popular in GA literature, it removes the dependence that selection in roulette wheel has on the scale of the fitness function, and it can support both minimization and maximization problems. Roulette wheel only supports maximization, and so the fitness function is often changed to turn a minimization to a maximization. This is messy and may have unintended consequences. Note that I'm keeping elitism- it seems to help a lot.
The other change is in the editing mechanism. I don't want to have to check for invalid individuals after each genetic operation, and I consider it a problem with implementation that they have to pass that kind of information to the genetic operations. I would rather keep in the spirit of GPM systems and simply do editing to ensure that individuals encode trees. To do this we can just remove the noncoding region, flip the individual, and do a postfix evaluation where operations that would cause a stack underflow are ignored. There is another way to do this by moving backwards through the list but the postfix interpretation is very nice and concise.

I like this editing mechanism because it doesn't introduce any new material to the individual. In fact, all the editing that is done- bits to symbols, removing the noncoding region, and the postfix traversal- will only remove elements, which I consider a nice consistent approach. The only problem with this is the possibility of removing all symbols and ending up with an empty individuals. I don't like this, but I am willing to say that these individuals just end up with the worst possible fitness.

Thats the main points I'm making here- I also investigate other notations and editing mechanism to explain why they are not satisfying. I would like to also investigate adding some of the features that different GP variations have, like typed programs, recursion, iteration, and lambda terms that allow it to solve problems that are otherwise difficult for GPs. I've read, and I agree, that a huge amount of power is to be had by adding nesting and program reuse in evolved programs. This would be a test of RGEPs simplicity in the sense that if it is simply it may be easy to add to. On the other hand, it may be to simple to easily add to, and I expect a lot of complexity will be added to make these kinds of modification. I doubt I will be able to do more then one, so if I could find a nice way to remove the closure property I would be pretty happy. I expect that many of the variation of GEP that already exist are easy to add to RGEP, like ephemeral constants.

Friday, November 19, 2010

Haskell DSLs

Domain Specific Languages are a big deal in Haskell, despite the lack of a macro system or some mechanism to actually modify the language itself. This is understandable- many problems can be expressed in languages, and sometimes we need a new one that model's our problem domain.

For my evolutionary algorithms framework, which I'm starting again for like the 4th time, I'm looking into ways of creating DSLs. I would like to be able to say that a GA consists of initialization, and a loop involving evaluation, selection, mutation, and crossover. Then I want to be able to say just as clearly that PGEP is initialization, and a loops involving evaluation, selection, point mutation, rotation, 1 point crossover, and 2 point crossover. And I want to be able to specify the type of selection, and make the evaluator easy to add. In addition, there are some aspects of EAs best handled by lots of state, which is a problem. Its not that I could use a state monad (I am for randomness already) but it would be complex and the type of state would change for nearly every experiment.

My current solution is probably the one I will continue with, and I want to describe the advantages and disadvantages to this and a couple other DSL strategies. The first strategy I looked into was a data definition, the kind you see all the time in Haskell blog posts.
data EAExpr p = Eval EAExpr | Select EAExpr | Mutate EAExpr | Loop Int EAExpr EAExpr | Crossover Int EAExpr | Population p
or something like this. Note that we may want to define it differently and take the fixed point of a data definition to get exactly the right structure, but since this is not the route I'm taking I'm going to skip over that. Then one would supply an evaluator for the problem you were solving. The problem is that the language needed may change depending on the problem and the specific EA. I could try and make a very generic one, but I'm not sure I would gain anything by it. The main thing with data definitions is that you can't extend them once they are set up. Or can you?

There is a way of constructing languages embedded in data definitions that can then be composed using the operators of combinatorial species, see "Datatypes a la carte." The problem there is that it is fairly complicated and I'm not sure I really need that level of complexity right now. Instead, I'm going the route of making many classes, many of which have only one function, and making specific conglomerates of theses. For example, the GAPop, a genetic algorithm population class, has no new methods, but must be a mutatable, crossoverable, selectable, and evaluatable thing. This lets be define some basic structures and make instances for them, but to leave the language ultimately open. I can add new structures and their instances any time, which is a lot of work but I think its the price I pay for lack of specific knowledge about applications, and I can add classes (functions in the DSL) easily. Then I can describe default and specific instances of algorithms, like a basic GA that can be filled in with a pop, evaluator and some parameters. I even get the property that, given the class defs involved in an algorithms description I can determine the kind of statements that might be made.

In a way this is hardly a DSL- its just a lot of classes and datatypes glued together by monad syntax. I'm going to think of it as a DSL though, and try to get it as nice as possible, so I can describe exactly what I want and not need to supply lots of unrelated information. And isn't that the point of DSLs, to be able to describe what we want the way we want to say it?

Problems I anticipate include overlapping instances and unwieldiness. I'm still researching these methods in my spare time to see if there is some problems with this approach that I'm not aware of that inspire people to try all the different strategies you see talked about.

Tuesday, November 16, 2010

An Algebra of Datatypes

In this post I'm going to give a brief description of the first couple operations in the theory of Combinatorial Species, without the category theory behind it. I hope to explain the notation and to show its relation to Haskell's Algebraic Datatypes.

There are several primitive types, which all others are build out of by combining primitive types with one of the operators. These types are not the normal types one thinks of (int, string, etc) but are rather the building blocks out of which structures are created. It is these structures that can be thought of as holding a concrete type (this is made explicit in species theory).

The first primitive type is 0. For all sized structures it is not a structure at all. It never holds data or does much of anything. Then we have 1. 1 has no place to hold data, so it is like (very much like!) nil or () in Lisp- it denotes an empty container. The rest of the natural numbers are defined similarly, with 2 being essentially booleans, 3 being taking three possible values but holding no data elements, etc.

The next primitive type is E, the type of sets. There is only one set of each size, as sets are unordered, and so "permuting" them makes no sense.

C is the cycles, which are like linked lists where the end node is linked to the front. They are unique up to rotations.

Then we have X. X is a hole in which to put data. Its kind of like a variable- we don't at the time we write it know what it will hold, but we know it "holds" something.

To combine these types we have many operators. I will only talk about three right now- +, *, and o. The "o" is meant to be composition, like the composition of functions. The sum operator means that the resulting structure can be either one thing or anything thing. For example, 1 + X is exactly the Maybe type in Haskell- it is either a placeholder (1) or Just an element (X). This is related to taking a disjoint union.

The * operator pairs things together. Where + is like the Haskell Either type (which is how we express + on values) * is like ",", the pairing function. Note that in Haskell when declaring datatypes we can have multiply constructors, if desired. Since they are a tagged union of types they correspond to the + operator on species. Then we have * as the fact that a type constructor can have multiple values- they are really tagged tuples. An example is 2*X*X, which we can interpret as a pair and a index into the pair, which distinguishes one value over the other.

Interestingly we can write, for any n, 1 + 1 + 1 ... + 1 n times. Similarly, X*X*X = X^3. Many such properties work out in this system. This means that what we end up with are types of the form c1*X^x1 + c2*X^x2 ... + cnX^xn, which are polynomials! This means that (as far as I know) Haskell's algebraic datatypes are polynomials in the theory of species!

For the concerned reader, note that this isn't really trues for many different reasons. For one, Haskell allows infinite types, I haven't talked about how to include least fixed points, and I'm not sure if some Haskell type extensions don't have neat analogs to the theory of species.

Also notice that we haven't talked about two of the primitive species! The reason is that cycles and sets are "non-regular" in that they can be modified in a certain way- we can take a renaming of one- without really changing it. They are not part of the Haskell datatypes, which are all regular.

I hope that this is at least a tiny bit helpful to someone trying to learn about differentiable datatypes (which are amazing!) or any of the other topics surrounding datastructures that use this notation. I hope to write some time about the other operators- composition, functorial composition, differentiation, pointing, and cartisian products, as well as maybe some of the categorical explanation of what is going on. I am also interested in some of the other parts of the theory, like extending the (U, +, *, 0, 1) commutative semi-ring with unity to a ring with unity (additive inverses are called virtual species) and adding multiplicative inverses in order to express certain things. There are even further extensions, such as a generalization to structures holding many types of data (all of these have kind *->*) called k-sorted species, as well as "weighting" species to express more complex properties of the structures we are describing.

Friday, November 12, 2010

Greatest and Least Fixed Points

I've recently stumbled upon a goldmine of amazingness in computer science and math. Combinatorial species, categorical treatments of computation, and many different strange things we can do in Haskell's type system and with the extensions to the type system.

One of these things is to have types that take fixed points of other types. This seemed very strange to me at first, and I've only just now understood the difference between the greatest and least fixed point of a type declaration. Thank you "A Neighborhood of Infinity" for being such an enlightening source of math and computer science knowledge!

As I understand them, polynomial types are of the form c1*X^n1 + c2*X^n2 + ... cN*X^nN, ci >= 0, ni >= 0. Since Haskell allows multiple type constructors, separated by pipes (|), the type systems admits "sum" types. A sum type is either one of the type on the left of the + or one of the type on the right. Each constructor (where the name is the tag, making the sum a disjoint (tagged) union) is isomorphic to an n tuple. This is a product type, where a product type is a pairing of one of the thing to the left of the * and one of the type to the right of the *.
It is interesting to note that we can also do composition of types, such as [(a,b)], where we are composing list with a product of types a and b. This means that the only operations from the theory of combinatorial species that Haskell does not have are differentiation and functorial composition. Of course, there are libraries for the creation of these kind of types, but I'm specifically talking about encoding them directly in Haskell as types, not as data.

One more thing before I get to the point of all of thing is that the Either type in Haskell is a value level sum, and comma (,) is a value level product.

So! Fixed points! If we consider the list type LA = 1 + X*LA, which is lists over a type A, we may want to solve for LA. What finally cleared up the idea of greatest and least fixed points for me was that this type can have two meanings, one which is "smaller" than the other. The "smaller" meaning is that of finite lists- a list is either an empty list or a thing paired up with a finite list. The "larger" meaning (the meaning that admits larger values) is that a list is either empty or a thing paired with a possibly finite list. This allows infinite lists (which is what Haskell means by a list). I was reading a paper by Wadler that make use of the idea of the least fixed point, and I completely missed out on the fact that he was using this idea to make sure that all the structures that he was talking about were finite.

There is one more thing that is interesting to note about this idea. In Haskell we can legally write this type:

data mu p = In (p (mu p))

I still don't know why they use mu to denote this "least fixed point" type, nor why the type constructor is called In (is it related to the in operator used to come out of a functor structure in a F-algebra, which is given the same name? I think it might be). The point here is that we then define something like this:

data Val a = Val Int

data Add e = Add e e

We can then create a sum type out of Val and Add, and create an Expr type mu Expr which takes the least fixed point of the sum type. They we can create an evaluator for this type. The nice thing about working this way is that we can construct types out of other types, and be explicit in the type signature about what sorts of things we admit in the resulting type. This is all describe in the paper "Data Types a la Carte" by Wouter Swierstra.

Okay, that is enough for now. I am still learning this material, and it is way beyond me in many ways, so there may be abuses of terminology and mistakes of understanding in the above post, so beware!

Combinatorial Species

I've just discovered the most amazing piece of math- Combinatorial Species. I was aware of the "polynomial" types of Haskell and the notation, which itself is amazing, as well as the idea that we can take the derivative of a type to get the same type of thing, but with a "hole" in it.

But this is just one small part of the much larger theory being used here. The basic idea (as I understand it) is that in the field of combinatorics it has long been known that generating functions can be used to solve combinatorial problems. It is possible to perform operations on the simple, well known, series and find solutions to complex problems. There are quite a few operations, including a sum, product, composition, functorial composition, cartisian product, and differentiation. It turns out on the other hand, that we can apply operators to datatypes, as written in a particular notation, and get the structures involved in a problem, while simultaneously performing the analogous operations on the datatypes associated generating series, and both will tell us about the same problem.

I'm being very vague here about what exactly is going on with this theory, but I hope to post more about it, and hopefully figure more out myself.

Tuesday, October 26, 2010

Programming Game

I just finished this game called light-bot at:

I enjoyed it a lot. It tests your ability to think algorithmically by programming a little robot to perform a simple task. You have a "main" method which consists of as many as 12 atomic provided statements. Of these statements there is a call to one or the other of 2 functions, which have 8 slots for statements each.

I plan on playing at least one more time to see if I can reduce the number of statements I use (which it keeps a total of over the course of the game).

If you have some free time, I definitely recommend trying this game out.

Monday, October 25, 2010

Most Evolutionary Algorithms are Simple

My newest idea for implementing a library for GPM, GA, GP, GEP, particle swarm type systems is that there is just no way to cover all cases reasonably. We can't really say anything in general about what we are evolving, how the evolution occurs (which operators, structures, etc), or anything really. We can't (without losing generality) fix the structure of the algorithm, or what problem it solves, or what the individuals look like, or what the population looks like.

Therefore I am relying on the relative simplicity of Evolutionary Algorithms and rather then providing a reusable library that forces the user to only express certain types of EAs, I'm just providing code for the ones I want to do, and even for those ones I'm just constructing the algorithms and data structures pretty much directly, with very little abstraction. The idea is that the user just creates everything from scratch every time with help from the library, but not simply by composing structures in the library, as that doesn't seem feasible.

The best way I can think to help users (myself) create EA systems is to make sure the library stays out of the way. EAs can come in any shape or form, and the important thing is to fill out design spaces that might be useful in as lightweight a way as possible. Don't force the user to redo *everything* just so that they can vary some part of the system that usually doesn't vary, and make it easy to construct a new method using only the relevant parts of my library.

This is kind of a concession to the complexity of the problem- I have to start running tests instead of trying to get the program to be as pretty as possible. If this means poor code, I will do my best and hope to fix it when I come up with something better. I'm not unwilling to start over if there is some nice way of doing this.

Thursday, October 21, 2010

Structure of GPM Systems

I've been trying to understand the structure of Evolutionary Algorithms in generals, and noticed that they are very complex and very little can be said about them in general. In a previous post I talked about how much variation exists in different EA systems, and that it is hard to say anything at all about them in general. In light of this, I don't plan on making a general EA system, but rather a program for a particular type of EA- a GPM system.

GPM (Genotype-Phenotype Mapping) systems are something that I've only just recently discovered. They are the basis for all linear genetic programs and GEP in particular can be viewed as a specialization of the very general GPM system. This idea introduces exactly the analogy I was planning to make about the biological process of gene expression from base pairs to RNA, editing of the RNA, and then the creation of proteins. In a way it is nice that this is already introduced, as it gives me a vocabulary and justification for my thesis.

GPM systems consist of some linear encoding of things (symbols from some alphabet, integers, bits, real numbers, whatever). The evaluation process is where the difference between GA and GPM appears, although a GA could be considered a GPM where the translation is not performed (GPM generalizes GA in that sense). The process involves grouping base pairs into codons (such as a group of bits in BGEP) which are translated into a RNA strand (a list of operators and terminals in GEP) which can then be edited before the expression process. Editing is justified by the editing done to RNA before protein creation, though as I understand it the process is actually not all that related to the process found in actual gene expression. This editing is performed to make sure that the gene can be expressed as a valid phenotype. In other words it modifies the individual until it can be translated into a program. This stage may involve making sure the genetic information satisfies other constraints as well. Some editing operations may be replacing symbols, deleting symbols, and adding symbols.

After editing the program must be created. This may be very involved- wrapping it in a main function and compiling for example- or very simple- evaluating the prefix expression in PGEP. The generality here is very surprising- in the papers by Wolfgang Banzhaf he evolves a C program, which is quite a feat, considering the complexity of the syntax. Unfortunately it requires a very complex editing stage, which can add all sorts of information to the individuals such that it hardly resembles the original genetic information at all.

Using GPM as a basis, and drawing from GEP, PGEP, and BGEP, I hope to introduce a simple (conceptually and in implementation), effective, unconstrained, general GPM system that I am calling BPGEP (Binary Prefix Gene Expression Programming). With the structure suggested by GPM I will be making this EA system I'm writing in Haskell a GPM system, allowing individuals to by only lists, populations to be lists (at least, for now they will have to be lists), operators to do the usual mutation, crossover, selection, etc, as well as a way of setting up translation between encodings.

Thursday, October 14, 2010

Data Structures as Free Algebraic Structures

I've known for a while that the Kleene star over some alphabet is the same as the free monoid. I've just discovered that other data structures are described by free structures over other algebraic structures. This is mind-blowing to me, so I'm going to record here what I've learned.

Lists are things written one after the next. The order is important and we can have repetition. The only thing we can remove are "empty" things (the empty string), which is exactly like saying that 2 = 0 + 0 + .. + 2 + 0 + 0 + ..., adding and removing identity elements is trivial, and we assume all are removed (the word is in normal form). Since the monoid here is free, we don't know how to do the operator, just that there is one. This is why lists are free monoid, we are multiplying elements by putting them next to each other, but since we can't reduce them we end up with just a list of elements.

If the monoid is commutative its free versions gives us the bag structures- repetition but no order. This like a list where all orderings are considered the same list. Commutativity allows us to move elements around as we please, and since we can't combine any elements we must keep them all.

If the monoid is commutative and idempotent we get sets as the free structure. Again the ability to move elements around and get an equivalent word gets us the lack of structure of sets. Commutativity says that we can group equal elements by moving them around in the word, while idempotency says that multiple element that are the same can be canceled.

If the structure is not associative, a magma, then we get binary trees. This is because we must specify the order of operations, which is like the s-expression representing a tree where all lists have two elements.

So cool! I really wonder if other free structures give us any other data structures that are well known or interesting. I will have to keep studying abstract algebra and investigate this. This even gives me a way to think about the repair operator I want to introduce to PGEP, where there is an equivalence relation identifying words over the symbol alphabet that encode the same tree. This relation only applies during evaluation, not during evolution (where the non-coding region is important).

Wednesday, October 13, 2010

Abstract Structure of Evolutionary Algorithms

Very often abstraction comes at a cost. I've often heard that we should avoid abstraction for abstractions sake- if we put in the effort to abstract it should be to serve a purpose. The more abstract something is, the more work is needed to come up with concrete results with it. I've found that Category Theory is really fascinating, but is dauntingly abstract. This comes up in programming all the time. If you create some library, what is the domain of problems it covers? Is it the most general possible solution, needing then to be specialized by the user before any real work is done, or does it just provide one solutions to a problem, so the user may have to write their own library if they need to be able to express something not covered by your program? This is an interesting trade-off, especially when the subject of the program is something complex and interesting. Evolutionary algorithms vary enormously, and it is not obvious that they can be characterized. In fact, I've noticed that there is almost nothing I can pin down and say "this is always the case for all EAs I can think of." Partially the problem is just that the space of possible EAs is so large. I've been thinking a lot about this, as I would like my Haskell library to be as general as possible, but to still be usable. This comes up especially for research, as I want to be able to try different variations of the methods I'm studying without rewriting the whole framework. If there was some way to capture the structure of EAs in general, or even just linear GA-like programs in general, it may lead to a useful library and a deeper understanding of EAs. At least it would deepen *my* understanding of EAs.

One of the nice things about Haskell is it forces you to be explicit about things that are otherwise implicit. This means that when we look at the type signatures of the things in a library for some theory or method it tells us something about the nature of that theory. Or at least, it can tell us something about the theory or method. For example, in EAs everything requires some generation of random numbers. This is not just about GEP or GA or GP or something like that- AI is stochastic. I am not content with just saying that if we don't need randomness then we don't use it (adding randomness generalizes no having randomness in that we can get back to not having randomness by not using it) because that justification allows anything at all to be added as long as we don't have to use it. But randomness stays- it really does describe something fundamental in EAs and AI in general.

The next thing is the population- every evolutionary computation that I am aware of has some population. This is not as easy to model as we might hope, as the population can have many different structures, though really it is often unstructured (a bag structure). I don't think it is enough to say that the population is a functor. All that tells us is that we can act on the individuals. While this is important, there appear to be at least two types of operators we use in the methods I'm familiar with- operators where we act on one individual (point mutation, rotation, IS, RIS) and ones that act on groups (often producing groups of the same size) like the various crossovers. I think selection can be put in this type- tournament selection makes potentially small groups, while roulette always uses the largest possible group. The ability to describe operations this way may turn out to be useful. The problem appears to be in the grouping mechanism- Do we need random groups (normal crossover)? Groups influenced by the structure of the population (crossover in a CGA)? With replacement (tournament selection)? Without replacement (potentially crossover, though it could be done with replacement)? And even then we have problems- in CGAs a crossover can occur with any cell around an individual, so do we get all the cells and only produce one individual? What about with pairwise crossover, were both individuals (the children) must be added back to the population? Or when the parents can survive crossover, adding their children without dieing off themselves? If we could characterize all this, maybe the general structure of populations could be defined, but basically the structure of populations is, as far as I can tell, some structure. There is not much else we can say, really. Also, we should force a structure to only have one grouping mechanism, in the case that we want to do multiple types of grouping, such as for different operators.

Moving on from structure and operations to the individuals that fill up the structure. Again, all we can say is that they have "some structure." In a way individuals may even be worse- at least I was assuming in the above that the populations were homogeneous (with the assumption that multiple groups, as in co-evolution, constitute multiple populations). Individuals can be very complex- linear lists of fix length with a fixed alphabet strings at best, but possibly a tree or a lists containing multiple types of things (bits as well as real numbers as well as symbols, etc). Perhaps this means that individuals are some algebraic datatype, to be deconstructed as needed. This is nice, but very general. Some encodings are very complex in a way, and may include a variety of types of loci next to each other. It would be nice to say "a list of loci" rather than "a real number follow by 8 bits followed by a real number followed by 8 bits..." and so on. So do we have some "evolutionary component"? What properties does this have? Often this is used in the more complex GAs so we can say something like "mutate all loci" and have each loci know what the means for itself. Then we can do crossover over a list of things, without worrying about what they are, as long as they are in a list. Crossover of course has different meaning depending on what structure we are talking about. And there is no use forcing individuals to be structures that have a mutation and a crossover operation because we can have multiple crossovers or mutations for the same structure for the same EA.

Back to the Haskell part, in addition to randomness, all EAs keep track of some changing state. This means we have the random generator as state, as well as some problem specific (method specific) environment. This can include rates (mutation rate, crossover rate, etc) the current population, and anything the operators need that can't be fixed before the start of the EA. This by itself is a problem, as any time we need to keep track of additional information we need to change the structure that we carry around as the state of the EA. Even if we just say it is some environment, this means we have a 2 layer state monad already. Should we add logging, so we can run experiments and write things out? This would give us a pretty complex structure, but maybe that is the best we can do, I really don't know.

And forget about just the data structures, what about evolutionary process? It goes through generations (though I saw a CGA once that performed not in the usual all-at-once style, but incrementally). These generations see the application of some operations that make changes to the population, resulting in the next population. I don't have anything more to say about that at the moment.

Sorry for the rant here, but I needed to get down the ideas I've been playing with trying to come up with something we can generalize and create a nice, usable EA library out of. I may just have to pick some concrete behaviors out and say, "this is the type of EA that this library supports" and hope that it will be usable as well as modifiable, abstract as well as conrete.

Preventitive vs Corrective: Ways of Dealing With Invalid Structures

In evolutionary algorithms, it is common to run into situations where an invalid solution is created through some manipulation of a valid solution, or by random generation. Creating invalid individuals is a problem because their fitness can't be directly evaluated- they must either be given the lowest possible fitness, or repair somehow. Often the EA is constrained to that the form of the encoding of the solution can only create valid solutions. This is what I am going to call a preventative method. There are many techniques that use preventative methods- GAs often do this naturally, though in some cases, such as optimal list ordering problems like the traveling salesman, there is a need to be clever with the encoding (tricky encoding). In GPs, the genetic operations are carefully chosen so that they can only produce valid individuals.

On the other hand, I know some repair operators have been used (I'm not as familiar with GP literature as I would like to be). These are what I am going to call corrective- they take an encoding and make it encode a valid solution if it does not already.

In GEP the original method used a preventative method (the head-tail method), and it was claimed that this saves a great deal of computational effort that is wasted in finding and fixing invalid individuals. It isn't mentioned in the original GEP paper, but the other problem with corrective methods is that they often have to make significant changes to the individual, such that sometimes it is hardly even the same individual and might as well have been created randomly (almost).

The additional complexity added to GEP from the preventative head-tail method of ensuring valid individuals soon was replace (in most papers) by a corrective method where if an operation (including random initialization) would result in an invalid individual, the operation is undone. I call it corrective because the operation must actually be performed and then undone (corrected). This opens up the individuals to encode a wider variety of trees. This is especially nice in PGEP, where it is easy to check if an individual is valid without making a whole tree. The problem with this method is that it requires additional processing to do an operation and check for correctness, and even worse- it requires knowledge of the eventual expression tree to be made available to the genetic operators. This is too hard in in PGEP, but is harder in BGEP or BPGEP.

Other methods have been proposed, such as Unconstrained Gene Expression Programming (UGEP). This allows any change at any place, but has a lot of weird and complex processing to be done, including multiple fitness evaluations. The translation process ends up with 0 or more expression trees, all of which need to be evaluated. UGEP also used karva notation, which is particularity unfortunate considering how the expression trees are built (there is a lose of locality of subexpressions). I think it is actually a very clever idea, though maybe the expressions should have just been linked by some linking operator like the multi-genetic chromosomes in GEP. This may create many very wide trees though, and maybe it is better the way they do it in the UGEP paper.

Naive Gene Expression is another clever encoding, but I'm not convinced that it is worth the restrictions it imposes on the problem. It also uses karva notation, which I don't like nearly as much as prefix notation. It creates a complete binary tree (as operators in NGEP must have < 3 arguments) and fills out the tree karva style. This means that it always fills out the tree, but some parts of the tree are not used. It is an interesting idea, and essentially uses the same head-tail method of GEP to impose validity, but ultimately there is little gain for such a loss in generality.

I'm going to be proposing another corrective method that I think is very nice. It allows all operations to be performed, requires no knowledge of expression trees in the operators, is simple and fast, and preserves substructures nicely. The last part is particularly interesting, as I mentioned before that some attempts to repair are very disruptive. This method only repairs when expressing an individual, and never has to actually edit the individual itself- allowing any genetic variation at all to exist. It does a simple single pass through the linear encoding, removing some operators, and "floating" subexpressions up through the tree until the tree is complete. It has a nice biological justification in RNA editing, and allows some amount of neutral mutation by expanding non-encoding regions to inside the gene or at the end (one or the other).

I've never seen a comparison of the different validity enforcing methods, though probably there exists one for GPs. I hope that this will be one of the contributions that my masters thesis makes to GEP research. In another post I will describe the actual mechanism that I am proposing.

There are other things I would like to add to GEP. My thesis so far is about making it as simple and expressive as possible, but I would love to see some of the great stuff the GP community does attempted in GEP. Some of these things are recursion, iteration, strongly typed expressions (removing the "closure" condition in GEP), and the use of lambda calculus. Since one of my main interests is in the issues surrounding functional languages, and PGEP has already given us a way to evolve functional expressions (compare PGEP to most linear GPs, or even to karva notation), it would be great to carry the torch further and make the expressions even more closely related to functional languages.

Wednesday, October 6, 2010

Cellular Autamata on an Infinite Surface

Imagine a single cell in a huge grid of individuals, such as in a CGA (cellular genetic algorithm). In one generation the cell may crossover with any of its neighbors, but that is the limit of its involvement in the grid (barring some extra stuff going on like long distance links between cells or something). After two generations its genetic material may have spread a little farther, and it may get some material from a cell one more level away from. After 100 generations it will have interacted with cells 100 steps away.

But what if the grid is so large that each cell has some other cells that it never interacts with, even indirectly? What if the grid was infinitely large? In a non-strict language like Haskell we could even implement such a thing. We could start with some single cell in the middle (probably this would be a pointed grid, so we elect this "middle" cell to the the individual we focus on) and only instantiate (randomly) cells that we need to evaluate this cell each generation. After some number of generations, it will have forced the evaluation of a large space around it, but there will still be this infinitely of space- representing a grid of infinite interactions of genetic material. We could stop after 100 generations, say, and then search this grid forever, keeping track of the best individual we find. It would be an infinite space of solutions, with patches of fit ones that out competed the ones around them.

Admittedly there would be problems holding this in memory, as well as problems getting good individuals to evolve, with so much random material out there to interact with and no way to bound interaction except to out compete it (which is kind of cool by itself).

This is just some idea I had in class today, but I thought it was fun enough to post about. Could be fun to come back to one day, even if it never amounts to anything.

Wednesday, September 29, 2010

Symbolic Regression in Haskell

I am testing my Haskell GEP library with a simple symbolic regression problem from the paper "Investigation of Constant Creation Techniques in the Context of Gene Expression Programming". I like the idea that the individuals evaluate to something that is then mapped to a fitness, even though my first thought (the way I did it in the Java library for GEP I wrote last summer using the Functional Java) was to have the expression tree of the individual be the expression they encode. The real difference here is how, when you encounter a variable like 'x', how do you give it a value? Do you evaluate the expression tree with some extra global state (kidding!), pass down an environment (what would it looks like?), or something else? In the Java program I passed info to the evaluator telling it to evaluate the individual with some value for x. This was messy and required each use of the function to go through the whole evaluation of the expression tree.

My solution was to actually create a function of type Double->Double, which is the type of thing we are optimizing in this case (unary functions of real numbers, f(x)=something). This means we must construct some function that, given a value x, calculates the function the individual encodes, and we should do it incrementally so it can be built by a depth first evaluation of the expression tree.

I want to note that the other way to do this is to create an instance of State Double, or more generally State Env where Env is some environment necessary for the computation.

So, how do we make such a thing? We must be able to take our terminals (constants and the variable x in this case) and combine them with our operators (+, -, *, / in this case).

First the definition of an operator
data OP a = OP { arity::Int, applyOp::[a] -> a, name::String }
Note that operators know their name, and have a function that takes a list of arguments and evaluates to something.
Then of a variable
var = OP { arity=0, applyOp=const id, name="x" }
A function that turns constants into operators
constant d = OP { arity=0, applyOp=const (const d), name=show d }
and a function that turns a function Double->Double into an operator
function fn nam = OP { arity=2, applyOp=(\(f:g:xs) -> (\a -> f a `fn` g a)), name=nam }

The trick here is that a variable x is just the value of x being passed in as a parameter, therefore x=id. Constants take in the value for x, but have no use for it, therefore a constant must take something in but always return the same thing- const d where d is some constant.
Binary functions, the only type I'm defining right now, must apply themselves to the result of evaluating their arguments, and their arguments must know the value of x to evaluate, so we take the functions, and return a function that, given an x, evaluates the functions and applies the operator to the results.

This is really very simple, I am just so relieved that I can actually create such a thing easily- it was really frustrating in Java not having higher order functions (despite using the Functional Java library, which tries hard, but is nothing like a real functional language for ease of use).

Now I have to find the right construct for implementing the evolutionary algorithm. This has presented several problems, the solution to one may be a monad transformer, but I'm not sure.

Incidentally the other problem I'm having is that it seems like the validity checker is rejecting all variation (mutation, rotation, etc) so no new individuals are ever created after the randomly initialized population is created. This is not cool, and I'm not sure whats going on- the checker seems to work, the operators work by themselves (when not reversed by the checker) and yet I've never seen it create new genetic material, even through recombination.

Tuesday, September 28, 2010

Treeless Expression Trees

I am fairly certain that in BPGEP there will be no need to ever actually create a tree data structure. We can perform our genetic operations on a bit-vector (or a list of symbols if you really want- I actually think PGEP is fine without binary encoding in some cases) and we can already perform evaluation of individuals without the actual tree because prefix evaluation is so simple.

The neat thing though is that I think we can get away without checking for consistency of trees. It shouldn't be difficult to repair trees by acting on the linear structure- in fact, I think it can be done in a simple little O(n) way. If we find and chop off the non-encoding region first (a O(n) operation), this is still n*n, 2n time. This is actually much *less* work then was being done with the consistency checking, because this will only be done when actually evaluating the individual, rather then for each genetic operation. It may be more then the original GEP needed, but it requires no concept of the eventual structure of the expression tree anywhere, which saves in complexity and allows for a more general range of genetic operators.

I also think that it will do a fair job of preserving substructures- which is the most important thing to me about PGEP.

Basically the only thing that makes it GEP and not GA besides some addition operators (rotation doesn't really make sense in GAs) is the evaluation function, which you have to specify anyway- I think it would be possible to reuse GA libraries for BPGEP with some additional code all located in the evaluation.

Pretty cool.

Sunday, September 26, 2010


The videos posted by Catsters on youtube are really a great resource in learning Category Theory. The instructors have a sense of humor, and seem to be enjoying themselves. Unfortunately, I often am unable to understand the concepts given the formal nature of their discussions. The way I've been proceeding is to determine and then learn the specific concept that each idea in category theory generalizes. In one video the instructor suggests that you can either prove a claim they are talking about, or meditate about it in a categorical trance until you feel like you understand it. I've read of people having said that they do this very often (go into categorical trances), but I've never understood enough about the ideas to do so. Until now! I was able to understand a (admittedly trivial) concept in category theory (pullbacks) in at least some basic sense simply by thinking about the meaning of the associated commutative diagram.

A small victory, but a victory none the less.

Complexity Theory

I've been looking into Calgary as a possible next step in my education. One thing that is attractive about it is the contingent of AI researchers that investigate swarm intelligence. In addition to my interest in functional languages and the subjects that surround them, I really like emergent behavior in complex systems, and I've always found swarms of things to be beautiful. The massive random-seeming complexity and high-order structure is both interesting and potentially useful. It is my opinion that this type of behavior has wider reaching (and more beneficial) consequences then anything else I could study (why I believe this is complex- perhaps some other day I will post about it).

In this spirit I read a book called Complexity by Melanie Mitchell. Her work also introduced me to Genetic Algorithms (and her discussion of Messy Genetic Algorithms is my inspiration for Messy Gene Expression Programming), and now to complexity theory, which underlies the type of emergent intelligence that swarms are known to exhibit. The book is very interesting, covers a wide variety of topics, and is very recent- published in 2009. The step by step description of the program Copycat was instructive, and I appreciate the overview of some ideas of the structure, definition, and difficulties of the concepts of information, computation, etc.

I think it is interesting that, given the distribution of gender in the field of computer science, that Melanie Mitchell should be an inspiration to me in complexity, Candida Ferierra the inventor of the field I'm doing my research in, and Xin Li's PGEP my favorite version of GEP which I hope to propose a variation of as my masters thesis. I wonder if complexity and AI has a better gender distribution then the field as a whole or if this is a coincidence.

Unification and Assignments

Is unification an appropriate tool in assignment generation? I imagine a user specifying some
partially filled in template (possibly a type like (a->b) or (((Int->Int)->[Int]->Int)+) where we are saying that we want a problem where + is folded over a list. A problem is generated and the very general test case of folds is specialized to use some addition operation. The story is one of probably many possible for that type and the things in the list are things with a property that can be summed. This is discovered by a database of knowledge that backs the whole system. Hopefully there is some way to allow composition to occur between function types, and we can get application of assignments to assignments, composition of assignments, and still instantiate because some invariants are satisfied and ensure or at least guide the composition to semantically valid templates from which assignments and test cases can be created.

Will this ever exist? Does it make any sense? I don't know.

Trees are Complex

Tree structures have nice properties as representations of solutions to many problems. This is rather obvious, but it is also very important- the whole subject of Genetic Programming relies on it. They are used because they can describe things that are built from subexpressions and things that naturally have this structure. But Gene Expression Programming exists essentially as a biologically justified reaction to the complexity of Genetic Programming. In this sense my research is justified in a way I would never have anticipated- it is more like GEP then GEP is. BPGEP is better justified in some ways (though PGEP removed some of the operators that where biologically justified in GEP, but I'm not really concerned about them) and it (hopefully- see my last post) will remove all need to be aware of the tree structure of the solutions.

If we could have a system in which all knowledge of the expression tree is in the evaluator, which included some easy and obvious repair or adjustment operation to ensure all individuals encode some tree or other that works for all problems, then we would have no more reliance on trees and their complex structure. Not that trees are all that complex- the point is that genetic operators work very well for linear symbol lists, especially bit vectors, and GP has historically had much research into finding good operators that respect tree structures. In BPGEP we have a system with a trivial encoding, binary, simple genetic operators (point mutation, single and double point crossovers, and rotation (also selection, but that is not dependent on encoding)), no tree related complexities, and a solid biological justification without going crazy.

I know I posted this idea before, but I want to emphasis that this is a direction consistent with the spirit of the original GEP paper. I feel pretty good about this.

Tuesday, September 21, 2010

The Future of BPGEP

My current plan for the direction of my thesis is to keep modifying BPGEP until it is as simple and easy to use as possible. The idea is that the original GEP is just the inspiration from which a nice and biologically justified method can be made. There should be no knowledge in the genetic operators about the structure of the tree an individual expresses, nor even about the symbols it encodes. It should be easy to specify a terminal and operator set without getting into details about the binary representation. New genetic operators should be easy to come up with, as all they have to do is preserve length. New chromosome structures (which are common for holding extra information in individuals) should be easy to add- just add to the end like in other EAs. All individuals should encode some expression tree- and if a repairing operation must occur it should be simple and quick. The biological justification should be stressed, as well as the binary representation- PGEP is an amazing method and we have to justify the added complexity of the binary representation if we are going to use it at all (which we are).

I would maybe like some interesting problem to solve that this method is suited for, but if I don't end up with one I expect to perform some symbolic regression task and maybe classification. There are some really cool problems used in the first GEP paper which may be a good testing ground.

It is my hope that in the end the number of parameters that one has to specify is smaller than GEP and that the implementation does not suffer too much complexity.

Looking at the original GEP paper it looks like you need to specify: popsize, Pm, Pc1, Pc2, Pgene recombination, Pris, ris length, Pis, is length, terms, ops, generations, head length, number of genes, linking function, and Pgene transpose. In BPGEP I expect to specify: popsize, individual length, generations, Pm, Pc1, Pc2, Pr, terms, and ops. These are mostly unavoidable and as long as there is a reasonable strategy for dealing with some of the implementation details those may be the only parameters to BPGEP.

Shape of Homework Assignments

My current thinking about the research for the whole automatic generation of homework assignments is about descriptions from which a problem can be constructed. The best I have so far is that type signatures should be linked to elements of stories, where the types can be specific or general (polymorphic). An abstract type (like a type constructor) can be made more specific, or kept abstract, and should be linked to one or more stories. These stories might be composable if the functions that describe them are composable

For example- give the subject "lists" we have functions related to lists- (a->b) -> [a] -> [b] for example, which is the familiar map. This type can have a, b, neither, or both specified as a concrete type, and may have either or both of its arguments fixed. Any of these combinations of things should give rise to some story about lists where we want a list of something based on the list we have or are given.

An example of composition is a function a->b and and a function b->c. We hope that if each function has some set of stories behind it, then there is some set of stories that are the composition a->c. For example- we have a price and want to know how much is will cost to buy n of something and we have a coupon that takes %10 off a price. The composition of those stories is- we want to know the price of buying n of something with a 10% off coupon and the type signatures are (Double->Double)o(Double->Double) making a Double->Double.
Interestingly the layer of specificity seems to be related to the difficulty of the problem. This even works in other domains- in physics it is considered more difficult to give a homework problem where we need to solve an equation symbolically then numerically, where needing to solve symbolically corresponds to not specifying a value for some input variable.

I wonder about the ability to describe problems with arbitrary depth- lists can hold lists can hold lists, etc. At each level we can "add" the lists, or we can add up all elements of each list (assuming the lists hold some monoid or something). It makes me wonder if shape analysis, something I've just recently become interested in, may help. We could say- lists of length n to m as part of the type if we can be explicit about shapes.

It seems like to work around all sorts of nonsense problems that may appear, we would have to guide our search for specifying an abstract type by some real world knowledge. This is where the whole concept map, knowledge domain thing comes in to this project. I'm not sure exactly how this would happen.

I will have to flesh out exactly how these types are arrived at, and how the stories they specify can be distinguished (as I suspect the underlying stories will direct the search for specificity as well as our ability to compose).

Thursday, September 16, 2010

Messy GEP Again

I'm back at the possibility of a Messy Binary Prefix Gene Expression Programming. The idea is to have small individuals that encode small, partial solutions to a problem. Then we save some of the subtrees of these solutions and add them as terminals (or more generally as operators with some number of "holes" in them if the subtrees are allows to be partial (partial=not all leaves are terminals)). These never terminals are available to be added by mutation and will enter into the population. Every so often (every n generations for example) this extraction of substructures would take place, potentially replacing some previously extracted substructures (or maybe even regular terminals!). The individuals would get more and more complex as they combined and recombined substructures into increasingly complex expression trees.

While it would be nice to include an "expansion" operator like that in the paper on self-emergence of substructures in PGEP, I'm not sure that I would add it. In real genetics, very complex creatures have a lot of parts of their DNA that do not change very much at all, and small changes often prove fatal when they occur. What I want is the emergence of complexity out of simple trees.

We would almost certainly have to bound the length of these extracted trees, or somehow ensure that they don't just grow to arbitrary length.

I'm not sure I'll ever actually do this research, but it seems like a neat little idea- modeling the increase in complexity of individuals through time to find very large and complex solutions to problems.

Modifications to BPGEP

I talked to a biology professor today about the biological justification of BPGEP. It turns out that some of the things I thought of to simplify BPGEP are biologically justified. The most interesting is the idea that instead of ensuring that no operation is ever going to create an invalid individual, as PGEP does, or separating each gene into a head and tail, as GEP does, we could just allow any change at any time. Then, when we are doing the evaluation, we repair the gene so that it has a corresponding expression tree. The way I thought of is to have a neutral element that we tac on to fill out the extra needed terminals. Dr. Hibler, my adviser, suggested that instead we just have that any operation that does not have a full complement of terminals just return, unchanged the one(s) we have. This would work for 2-ary operations, but I'm not sure if that would help with n-ary ones (unless we just returned the first argument or something). Apparently the RNA strand that a series of base pairs creates adds a head and a tail called a poly(A) tail, and I hope to add such a tail. In real genetics that tail does not have an effect on the resulting protein (I think) but here it has as little an effect as possible.

The other modification is that apparently sometimes parts of an RNA strand are removed before it creates a protein. This has some effect, but the result is often close to what it would be without the modification. This is apparently not very well understood- but it occurred to me that after expressing parts of the bit vector into symbols, we could remove symbols that encode "bad" things, like the rest of the 2^n symbols that fill out the terminal and operator list in BGEP. This may end up being a bad idea, as we may have individuals that need lots of repairing and symbols that get removed (which allow neutral mutations, so maybe they aren't all bad).

There are a huge number of possible modifications to the GEP system, I'm trying to not get lost in them. Hopefully some of the biologically inspired ones both improve and simplify the system and can be made into some sort of reasonable result. I would rather this not end up being yet another highly complex evolutionary algorithm that no one is interested in because of the high cost of entry (the cost of learning and implementing it).

Wednesday, September 15, 2010

Prefix Evaluation of a Chromosome in Haskell

My HGEP project is focusing on PGEP right now, though there is no reason not to add karva notation later on. One reason I'm doing prefix first is that there is no need to be aware of the structure of a chromosome, except a very simple validity check to ensure we never make any illegal individuals (illegal in the sense that they do not encode any expression tree). I've finished the prefix evaluation code, and I liked it enough to post about it.

The code involves evaluating a list of operations. The operations are defined like so

data OP a = OP { arity::Int, applyOp::[a]->a }

Each on is a function of some arity and takes a list of arguments- hopefully with length >= arity.One nice thing about prefix notation is that (as mentioned in the original PGEP paper) there is no need to actually create an expression tree- the operations can be evaluated as a list. The only complication is that the operations must "eat" their arguments off the list. Constants and variables are removed from the list and their values returned, but an operation like "+" must evaluate its first argument, then its second argument before returning their sum. The second evaluation must evaluate the list left over from the first evaluation. Hopefully some of that makes sense.

And without further attempts at explanation- here is some code:

prefix :: [OP a] -> a
prefix (op:ops) = evalState (prefixEval op) ops

prefixEval :: OP a -> State [OP a] a
prefixEval (OP 0 fn) = return $ fn []
prefixEval op = do
args <- evalArgs (arity op)
return $ applyOp op args

evalArgs :: Int -> State [OP a] [a]
evalArgs 0 = return []
evalArgs n = do
(op:ops) <-get
put ops
arg <- prefixEval op
args <- evalArgs (n-1)
return $ arg : args

Notice that it is a mutual recursion that allows us to take the list apart and eval each argument. This reminds me of the metacircular interpreter for lisp (eval apply). Also notice that the problem of having to evaluate only the part of the list not yet inspected is done by making explicit that the list is the state of the computation. I like how we have to be so explicit about the type of computation we are performing.

In another post I hope to talk about symbolic regression and different ways to implement it is Haskell.

Generalized Crossover in Haskell

I've started to see some nice stuff come out of my hgep project. The post "Operads and Their Monads" from (which is an amazing blog btw) uses some code that relates to operation in GEP and crossover in evolutionary algorithms. This is not really related to the post, but the way he does things is much better than what I was doing.

Crossover in general involves finding cut points and doing an exchange between two lists to produce two mixed up lists. I've never seen anyone do crossover with more than 2 individual, or immediately throw away one of the results, so crossover can be described as two lists of loci, and a list of cut points. The cut points can also be a list of the length of each section. That is the approach taken here.

And now some code:

crossover :: [Int] -> [a] -> [a] -> ([a], [a])
crossover ns as bs = (exchange f s, exchange s f) where
f = unconcat ns bs
s = unconcat ns bs

unconcat :: [Int] -> [a] -> [[a]]
unconcat [] [] = []
unconcat (n:ns) as = take n as : unconcat ns (drop n as)

exchange :: [[a]] -> [[a]] -> [a]
exchange [] [] = []
exchange (a:as) (b:bs) = a ++ exchange bs as

I just liked this code so much, I had to post it. There is some setup code, to apply it to random pairs, to make sure it only happens with some probability, etc, but this is the core idea of crossover in 7 lines of actual code.

Tuesday, September 14, 2010

Started a Real Haskell Project!

I have been working on a Haskell implementation of GEP (really PGEP because I don't like GEP's complexity) and I finally made it an official project, with source control through darcs, and available through cabal! The project is just called hgep, for Haskell Gene Expression Programming.

While there is at least one other Haskell GEP project, it is too deeply meshed with GEP's way of doing things, and I need the ability to change any part of the system at any time to introduce new ideas (for research). Also, I've wanted a large Haskell project to get me familiar with the language for a while now.

At the moment the project can perform a simple GA, where the fitness function is to interpret the individuals as a binary number and apply f(x) = x^2, and a simple symbolic regression PGEP problem. The PGEP section of the code is terribly slow, and needs a lot of work to add BPGEP and more machinery to allow me (and I guess anyone else, if someone were so inclined) to add new experiments and algorithms as easily as possible without locking too much into place. In fact, I don't plan on locking much in place at all- the project will be mostly components that can be fit together to construct an evolutionary algorithm, but will have to be put together from scratch almost every time (hopefully with some amount of help from the framework).

Monday, September 13, 2010

Automatic Generation of Programming Problems

I have just started a graduate assistantship with a professor doing research in the automatic generation of programming problems. The idea here is to define a template for a problem and generate (automatically) enough unique problems for a classroom, along with a testing module for each instance of the problem. It is possible that this will reduce cheating, and it will probably make life easier for professors.

There are several things that I can look into for this project, including extending the existing framework written in Java. The other possibility, which I'm looking into this week, is to continue a nice, mathematical, rigorous approach to studying programming problems. The student that started down this route used Haskell, and seems to have a firm grasp on category theory and functional programming theory (better then mine, but then I am taking up this project mostly to learn).

While I certainly wouldn't give up my evolutionary algorithm work to do research in this area, at least it ties in to some of my other main interests in computer science, namely mathematical foundations and function languages.

Hopefully I will know more about this subject soon, and I will post about it.

Monday, September 6, 2010

Self Emergence of Substructures

There is a paper by the creators of PGEP called "Direct Evolution of Hierarchical Solutions with Self-Emergent Substructures." This paper describes a system in which expression trees can be extracted from an individual (in the elite group) and collapsed into a single terminal, which is added to a library of substructures. Similar techniques appear to exist in other fields, such as static and dynamic super instructions in forth, which are known to improve the efficiency of native code compiler.

This idea has some interesting possibilities, methinks. For one, it allows substructures to combine in new and interesting ways. Also, it may allow greater depth to appear in expression trees, which is a problem in regular GEP in certain cases (a problem where many terminals should appear in the optimal solution would require very long individuals to ensure proper expression tree sizes to include necessary terminals). I would like to investigate this area of GEP research as part of my masters, presumably using BPGEP as the underlying method.

One possibility that I may try is a simpler way to add this technique to GEP then that proposed in the paper. It is not overly complex, but I think maybe a simpler way would be easy to digest for people using GEP, and may be able to keep the primary advantages. My adviser warned me against overly complex methods, where a great deal of computation time is taken just keeping track of things and updating all the moving parts. One of my favorite things about PGEP is its relative simplicity (comparing to GEP), and I don't want to add too much more to it other than the binary encoding.

Another possibility is to allow a great deal of subtree saving and collapsing, and to have relatively small individuals. The idea would be to recreate the conditions of Messy Genetic Algorithms, where small partial solutions are built up to create working solutions. The population would have small individuals, which would become more complex as the number of saved subexpressions grew. I would have to account for uncontrolled growth (as appears in some forms of GP) as the depth would increase with each "level" of saved subexpression. Such a method could use the same biological justification the MGAs do.

Saturday, August 28, 2010

Hierarchical Gene Expression Programming (HGEP)

I've been meaning to write this idea down for a while, because it is potentially interesting, and if nothing else it is yet another variation on GEP- of which there are already so many.
While it is nice that GEP is so flexible to allow many possible changes, many of them are either very complex, IGEP, or lost generality, NGEP. On the other hand, some remove complexity, like PGEP-which I think all GEP researchers should be using instead of GEP- or increase complexity in a very reasonable way, like BGEP. I'm hoping that HGEP is a reasonable increase in complexity, and I think of it as having potential benefits in domain specific ways, unlike PGEP which seems pretty generally to be an advantage over GEP.

So- HGEP. Or more like HPGEP, as I prefer prefix notation over karva. It is a sort of synthesis between the multi-genetic chromosomes of GEP(with a single linking function) and the single gene chromosomes of PGEP. It exploits the homomorphism between the fact that terminals are linked by operators, and that genes can be linked by operators. So why have only one linking function? Why not have each gene be a "terminal" in a second order GEP, where the operators are the same, but each terminal is an entire gene? While each HGEP individual would have an equivalent GEP individual, the GEP one would be very long and easy to disrupt. HGEP would impose the idea of hierarchy on the individual, and allow genes to combine in interesting ways, rather than guessing which linking function might be useful and imposing it over all genes.
This may introduce to much complexity to the individuals, especially because they would have variable numbers of genes. To fix the number of genes we could just have a GA style list of operators included in the front of the chromosome, and use it to link the genes. This would allow variation in linking functions without imposing so much complexity.

I will probably flesh this idea out a little more in another post. It is not so well developed at the moment.

Research Paper Nearly Turninable

I'm almost proud of how this research paper is turning out. I've had many problems with the code and I always seem to have difficulty presenting my findings, but the ideas seem good, there is some solid justification of my work, and even a sense of real extension on the work of others (BGEP in particular).
I would really really really like this to be in a language other then Java, as I've posted about before, but my lack of experience in Haskell is crippling my efforts to rewrite it. I'm starting small, trying a simple GA in Haskell first, and hopefully I will one day be able to extend it enough to include some of the variations on GEP like BGEP, PGEP, BPGEP, UGEP, IGEP, AGEP, NGEP, GEP-NN, GEP with DE, etc. Mostly though I'm interested in PGEP, as I really don't like the idea of head-tail separation and genetic operators that know if a locus is a terminal or an operator.
Well, enough rambling-back to work!

Sunday, August 8, 2010

I had an Idea!

BPGEP is a very cool method- all the (relative) simplicity of PGEP, with the low arity representation of BGEP. It is by far the most fun I've had with randomly generated bit vectors, thats for sure.
So why not apply this simply representation, along with its nice, substructure preserving prefix notation, to other evolutionary techniques? Bit vectors are hardly uncommon, and the translation and interpretation stuff from bits to an expression tree is not very hard. Maybe I could do a binary particle swarm, or maybe even mix it up and do a generation of one technique, then another, and maybe some hill climbing in between.
The fact that we can go from bits to trees in such a nice way is very interesting to me, and I bet there is some cool stuff to be done with it. Operators that preserve substructure, combinations of evolutionary methods, the simplest possible example of random mutation hill climbing. Good stuff.

Sunday, August 1, 2010

Things aren't so Simple!

Unfortunately I'm having problems getting results with BPGEP. It seems to find functions alright, but its terribly slow. I had to scale the parameters down enormously to get it to a reasonable time (It would have taken days if not weeks to do a single run with the parameters in the paper "Investigation of Constant Creation Techniques in the Context of Gene Expression Programming"). I'm okay with using different parameters because the point of the paper is to investigate BPGEP, not solve problems.
I would like to use the more complex function in the paper, the "V" shaped one, because it may be solved more easily when I add lots of constants to the terminal set, which I plan on doing.
Incidentally- why are all the fitness functions in symbolic regression papers so strange? The one in the original GEP research doesn't seem to force the fitness to be positive (unless I'm missing something) and the one in the paper mentioned above doesn't seem to make sense when the individual fitness is equal to the best found so far. The highest possible fitness would be .5, and if the residual is 0, the fitness would be 0. Its like I'm not interpreting it correctly.

Wednesday, July 28, 2010

Symbolic Regression Success!

I am so proud of myself. I successfully applied PGEP to two different symbolic regression problems (the one in the original GEP paper and the one in the original PGEP paper). The solutions are so accurate I'm certain they are equal to the correct ones, as I'm getting minuscule numbers for the residual error.
This is especially awesome because I've had problems with duplicating other peoples results, but this time it seems like I'm able to find the correct functions. Especially when adding constants to the terminal set, which seems to have improve the quality of the solutions by several thousand times with only 1, 2, 3, 4, and 7.
Now if only I can replicate those results (or something similar) with BPGEP I will be able to actually start collecting data and all that.

Friday, July 23, 2010

Independent Research Again

I am much further along in my attempt to write a Java program with some among of pure functional programming mixed in. I will be mutating the occasional state, but only because I don't currently have the time or interest in building enough infrastructure to avoid it. Right now I am using almost no new types. Populations are lists, individuals are lists, loci are symbols (which is a class I defined). Almost every part of the system is a higher order function, and no complex type hierarchies are getting in my way. I am really enjoying it, even if it is just Java, and I still have to fight the language and accept messy constructs for some things. Soon enough I will have a binary prefix gene expression program doing some simple symbolic regression using the fitness function I found in a paper on constant creation techniques in gene expression programming.

Saturday, July 17, 2010

Independant Research Update

I've got some work done on the evolutionary algorithm framework, which I'll informally call fea (functional evolutionary algorithms) (informally because I don't feel the need to give names to my projects).
The use of functional programming concepts such as composition and the basic data structures (lists and functions) the framework is much nicer, cleaner, and more expressive than it was before. Making an experiment consists of composing the right units into the right sequence, not filling out a lot of subclasses. I am spending a lot of time fighting java's generics and its syntax, as well as wrapping code in what are essentially lambdas (this would be so much nicer if java would just add closures).
I am just now starting to express gep, and I am glad to find that even in this infant form, my program can express them pretty much like a ga (which is the sample case I used to build the framework for). The only change is exactly where it should be, which is fitness evaluation.
Ideally I would have fitness evaluation consist of providing a translator from list to list and then an evaluator someobject->double. This would isolate the parts of fitness evaluation. There is a great deal of generalizing and abstracting work of this sort to do.

On another note, while programming this, I discovered and discarded self-bounded types. They seem interesting and potentially useful, but they can't seem to introduce generics in subclasses without breaking (I could write the classes but not instantiate them). Maybe I will post some code explaining them, cause they seem to have some cool properties and may be useful someday.

Friday, July 9, 2010

Starting Out

I would like to post about other things, but right now I'm starting to rewrite my evolutionary algorithm framework from scratch using the function java library. I would like to build it sort of "bottom up" style so that I can avoid some of the problem with my current framework.
The main thing is that the in the current one you fill in holes in a large and static structure (mostly) and are therefore constrained by what types of things fit in the holes. I would like to make it mostly about function composition and reuse of small parts, with a trivial little driver loop to get it going.

The representation of the whole algorithm and its state will be (Population, Environment, NextGenFunction, TerminationFunction). Populations are structures that hold structures of items, the env binds names to objects and can evaluate individuals (I hate to mix those ideas, but thats what it is for now), the nextgen function takes a population and returns a new one, and the termination function is from population to boolean and determines if the algorithm is done.
population is S I a, env is O->O next gen is Pop -> Pop, and term is Pop->Bool.

Sunday, June 27, 2010

The Essence of Evolutionary Algorithms

I've been thinking about the essence of evolutionary algorithms, trying to get the right structure for my framework- general enough to use in any project I want to do, but easy to set up and apply to common problems.
The structures involved seem to be- a population of individuals that are strings of symbols that gets exposed to an environment. The individuals learn from their exposure, a set of terminating conditions are checked, and if none are met, start from step 1.
This would looks really nice as haskell code (or really any functional language, I just happen to like haskell right now) , but I'm not familiar enough with it to write a framework that I will be using for a project that starts pretty soon.
Therefore, I will just hope to redo it all in haskell one day, and for now consider building the functional abstractions that I want to use into Java. This would be an interesting learning experience, and may end up improving the architecture of the program a lot.
On the other hand, I will almost certainly end up with only some of the cool stuff built into a functional language, the program will be verbose and incomplete, and may end up getting thrown out. I'm okay with that, cause if it works well enough for now, maybe I can get it right later, and if nothing else it will motivate me to deepen my understanding of certain aspects of functional languages. I will basically be making a little language in Java, and nothing teaches you a concept like having to implement it, and getting it to work correctly.

Tuesday, June 22, 2010

Modeling Substructures

The number of substructures in a tree is always <= the number of schema of its linear representation. In all but the most trivial case, it is much less.
This is interesting because since the original schema theory doesn't make so much sense for GEP, it also appears that GEP is sampling fewer schema. This is exacerbated by the fact that a substructure that appears in two forms or in different places could be considered the same structure (because we do not could the context of the structure as being part of it, only the structure itself).

There are a couple of things to note here though. One is that we are not searching the normal solution space, but rather the space of strings whose trees map to the space, so there are a lot of complications in relating the spaces directly and saying that we are searching one more or less efficiently than a GA might.

Also, the idea behind substructures is to sample them in different places and hope their effects are overall positive. In other words, while there are fewer of them, they can each have many effects in both their part of the tree and (if they are "open") the trees that complete them.

Overall they are really very different from the schemata of GAs. They have only the properties inherent in the idea of classifying parts of a string. They have different dynamics, meaning, and structure. They share other things, like the ideas of defining length, but I'm not sure if they should share notation or not- as I would like to be able to capture the actual vs smallest size of a substructure somehow.
An example of that btw is +-##1 could also be +1-## with no change in the resulting value. Are these the same? They have the same meaning (since + is commutative), but different defining lengths. And really the # could be of any size, and still match the 1 with the plus, like
+-####1 could be +-*23451. Maybe the # denote any subtree? This complicates the defining length, but would mean that I would probably define structures that look different as different, just so I could give them a length of some sort without resorting to a vague "minimal length when completed with minimal subtrees" or something.

All of this reminds me of my idea to introduce a "defining length reduction" operator. I'm still not sure if there is such a thing, but it might help, and hopefully one day we will have a model that would predict the effect of such a thing.

Monday, June 21, 2010

Schema Theory for GEP

I would love one day to come up with a satisfactory model for GEP that captured possibilities, its distinction between geno and pheno types, and the sampling of incomplete (I think of them as being "open") substructures or subtrees within an expression tree. I want it to be aware of the impossibility of saying how good a sequence of symbols is and rather to capture the essence of the structures along with their actual representation. It would have to include the idea that there are many possible linear sequences with the same evaluated meaning, the same structure, different defining lengths (which I'm borrowing from schema theory) and all sort of other things.

I'll post some haskell code related to this soon.

Sunday, June 13, 2010

Preservation of Substructures

I've convinced myself that one of the most important parts of GEP is the ability to encode knowledge about a problem domain in the form of a tree. While knowledge can be potentially encoded in many ways, and trees may be unable to encode some forms of knowledge (potentially limiting the ability of GEP to solve problems when there is no mapping (or maybe no easy or convenient mapping) between the solution space and a space of expressions encoded in trees), they are very nice for many things and have a nice notation (such as prefix or karva) that allow the whole genotype->phenotype thing that is also one of the most important parts of GEP.

I've been thinking of the different types of things one can do with trees and the ways trees can evaluate to solutions to different combinatorial optimization problems, and I've realized that I would like to be able to improve the ability of GEP to preserve structures (that is to say, subtrees). PGEP does this already, but I'm sure there are some other things we can do, such as saving useful substructures and making them terminal (which has already been done (Direct Evolution of Hierarchical Solutions with Self-Emergent Substructures by Xin Li)), or some other mechanism.

At the moment, I can't think of any new or interesting mechanism to try out, but I think there are a couple of forms that such a mechanism might take:
1. A new structure (like PGEP) that tends towards preserving subtrees in the presence of small disruptions to the linear structure of the gene.
2. A new operator that reorganizes the gene or otherwise tends to help substructures survive. I've been trying to think of a reorganizing operator for a while that would transform trees (especially ones made of commutative operators) into ones whose linear representation has the minimal overall length of substructures. I may have to talk about this more in another post.
Another example is Xin Li's (et all) operator that makes successful subtrees into new terminals.
3. A change to current operators to make them more aware of the structure of the genes. I plan to introduce a rotational mutation in BPGEP that does this sort of thing.

I may have more to say about making subtrees into terminals later, as the increased size of the terminal set may be offset by BGEP, making that a particularly effective strategy in BPGEP.