Tuesday, October 26, 2010

Programming Game

I just finished this game called light-bot at:

http://armorgames.com/play/2205/light-bot


I enjoyed it a lot. It tests your ability to think algorithmically by programming a little robot to perform a simple task. You have a "main" method which consists of as many as 12 atomic provided statements. Of these statements there is a call to one or the other of 2 functions, which have 8 slots for statements each.

I plan on playing at least one more time to see if I can reduce the number of statements I use (which it keeps a total of over the course of the game).

If you have some free time, I definitely recommend trying this game out.

Monday, October 25, 2010

Most Evolutionary Algorithms are Simple

My newest idea for implementing a library for GPM, GA, GP, GEP, particle swarm type systems is that there is just no way to cover all cases reasonably. We can't really say anything in general about what we are evolving, how the evolution occurs (which operators, structures, etc), or anything really. We can't (without losing generality) fix the structure of the algorithm, or what problem it solves, or what the individuals look like, or what the population looks like.

Therefore I am relying on the relative simplicity of Evolutionary Algorithms and rather then providing a reusable library that forces the user to only express certain types of EAs, I'm just providing code for the ones I want to do, and even for those ones I'm just constructing the algorithms and data structures pretty much directly, with very little abstraction. The idea is that the user just creates everything from scratch every time with help from the library, but not simply by composing structures in the library, as that doesn't seem feasible.

The best way I can think to help users (myself) create EA systems is to make sure the library stays out of the way. EAs can come in any shape or form, and the important thing is to fill out design spaces that might be useful in as lightweight a way as possible. Don't force the user to redo *everything* just so that they can vary some part of the system that usually doesn't vary, and make it easy to construct a new method using only the relevant parts of my library.

This is kind of a concession to the complexity of the problem- I have to start running tests instead of trying to get the program to be as pretty as possible. If this means poor code, I will do my best and hope to fix it when I come up with something better. I'm not unwilling to start over if there is some nice way of doing this.

Thursday, October 21, 2010

Structure of GPM Systems

I've been trying to understand the structure of Evolutionary Algorithms in generals, and noticed that they are very complex and very little can be said about them in general. In a previous post I talked about how much variation exists in different EA systems, and that it is hard to say anything at all about them in general. In light of this, I don't plan on making a general EA system, but rather a program for a particular type of EA- a GPM system.

GPM (Genotype-Phenotype Mapping) systems are something that I've only just recently discovered. They are the basis for all linear genetic programs and GEP in particular can be viewed as a specialization of the very general GPM system. This idea introduces exactly the analogy I was planning to make about the biological process of gene expression from base pairs to RNA, editing of the RNA, and then the creation of proteins. In a way it is nice that this is already introduced, as it gives me a vocabulary and justification for my thesis.

GPM systems consist of some linear encoding of things (symbols from some alphabet, integers, bits, real numbers, whatever). The evaluation process is where the difference between GA and GPM appears, although a GA could be considered a GPM where the translation is not performed (GPM generalizes GA in that sense). The process involves grouping base pairs into codons (such as a group of bits in BGEP) which are translated into a RNA strand (a list of operators and terminals in GEP) which can then be edited before the expression process. Editing is justified by the editing done to RNA before protein creation, though as I understand it the process is actually not all that related to the process found in actual gene expression. This editing is performed to make sure that the gene can be expressed as a valid phenotype. In other words it modifies the individual until it can be translated into a program. This stage may involve making sure the genetic information satisfies other constraints as well. Some editing operations may be replacing symbols, deleting symbols, and adding symbols.

After editing the program must be created. This may be very involved- wrapping it in a main function and compiling for example- or very simple- evaluating the prefix expression in PGEP. The generality here is very surprising- in the papers by Wolfgang Banzhaf he evolves a C program, which is quite a feat, considering the complexity of the syntax. Unfortunately it requires a very complex editing stage, which can add all sorts of information to the individuals such that it hardly resembles the original genetic information at all.

Using GPM as a basis, and drawing from GEP, PGEP, and BGEP, I hope to introduce a simple (conceptually and in implementation), effective, unconstrained, general GPM system that I am calling BPGEP (Binary Prefix Gene Expression Programming). With the structure suggested by GPM I will be making this EA system I'm writing in Haskell a GPM system, allowing individuals to by only lists, populations to be lists (at least, for now they will have to be lists), operators to do the usual mutation, crossover, selection, etc, as well as a way of setting up translation between encodings.

Thursday, October 14, 2010

Data Structures as Free Algebraic Structures

I've known for a while that the Kleene star over some alphabet is the same as the free monoid. I've just discovered that other data structures are described by free structures over other algebraic structures. This is mind-blowing to me, so I'm going to record here what I've learned.

Lists are things written one after the next. The order is important and we can have repetition. The only thing we can remove are "empty" things (the empty string), which is exactly like saying that 2 = 0 + 0 + .. + 2 + 0 + 0 + ..., adding and removing identity elements is trivial, and we assume all are removed (the word is in normal form). Since the monoid here is free, we don't know how to do the operator, just that there is one. This is why lists are free monoid, we are multiplying elements by putting them next to each other, but since we can't reduce them we end up with just a list of elements.

If the monoid is commutative its free versions gives us the bag structures- repetition but no order. This like a list where all orderings are considered the same list. Commutativity allows us to move elements around as we please, and since we can't combine any elements we must keep them all.

If the monoid is commutative and idempotent we get sets as the free structure. Again the ability to move elements around and get an equivalent word gets us the lack of structure of sets. Commutativity says that we can group equal elements by moving them around in the word, while idempotency says that multiple element that are the same can be canceled.

If the structure is not associative, a magma, then we get binary trees. This is because we must specify the order of operations, which is like the s-expression representing a tree where all lists have two elements.

So cool! I really wonder if other free structures give us any other data structures that are well known or interesting. I will have to keep studying abstract algebra and investigate this. This even gives me a way to think about the repair operator I want to introduce to PGEP, where there is an equivalence relation identifying words over the symbol alphabet that encode the same tree. This relation only applies during evaluation, not during evolution (where the non-coding region is important).

Wednesday, October 13, 2010

Abstract Structure of Evolutionary Algorithms

Very often abstraction comes at a cost. I've often heard that we should avoid abstraction for abstractions sake- if we put in the effort to abstract it should be to serve a purpose. The more abstract something is, the more work is needed to come up with concrete results with it. I've found that Category Theory is really fascinating, but is dauntingly abstract. This comes up in programming all the time. If you create some library, what is the domain of problems it covers? Is it the most general possible solution, needing then to be specialized by the user before any real work is done, or does it just provide one solutions to a problem, so the user may have to write their own library if they need to be able to express something not covered by your program? This is an interesting trade-off, especially when the subject of the program is something complex and interesting. Evolutionary algorithms vary enormously, and it is not obvious that they can be characterized. In fact, I've noticed that there is almost nothing I can pin down and say "this is always the case for all EAs I can think of." Partially the problem is just that the space of possible EAs is so large. I've been thinking a lot about this, as I would like my Haskell library to be as general as possible, but to still be usable. This comes up especially for research, as I want to be able to try different variations of the methods I'm studying without rewriting the whole framework. If there was some way to capture the structure of EAs in general, or even just linear GA-like programs in general, it may lead to a useful library and a deeper understanding of EAs. At least it would deepen *my* understanding of EAs.

One of the nice things about Haskell is it forces you to be explicit about things that are otherwise implicit. This means that when we look at the type signatures of the things in a library for some theory or method it tells us something about the nature of that theory. Or at least, it can tell us something about the theory or method. For example, in EAs everything requires some generation of random numbers. This is not just about GEP or GA or GP or something like that- AI is stochastic. I am not content with just saying that if we don't need randomness then we don't use it (adding randomness generalizes no having randomness in that we can get back to not having randomness by not using it) because that justification allows anything at all to be added as long as we don't have to use it. But randomness stays- it really does describe something fundamental in EAs and AI in general.

The next thing is the population- every evolutionary computation that I am aware of has some population. This is not as easy to model as we might hope, as the population can have many different structures, though really it is often unstructured (a bag structure). I don't think it is enough to say that the population is a functor. All that tells us is that we can act on the individuals. While this is important, there appear to be at least two types of operators we use in the methods I'm familiar with- operators where we act on one individual (point mutation, rotation, IS, RIS) and ones that act on groups (often producing groups of the same size) like the various crossovers. I think selection can be put in this type- tournament selection makes potentially small groups, while roulette always uses the largest possible group. The ability to describe operations this way may turn out to be useful. The problem appears to be in the grouping mechanism- Do we need random groups (normal crossover)? Groups influenced by the structure of the population (crossover in a CGA)? With replacement (tournament selection)? Without replacement (potentially crossover, though it could be done with replacement)? And even then we have problems- in CGAs a crossover can occur with any cell around an individual, so do we get all the cells and only produce one individual? What about with pairwise crossover, were both individuals (the children) must be added back to the population? Or when the parents can survive crossover, adding their children without dieing off themselves? If we could characterize all this, maybe the general structure of populations could be defined, but basically the structure of populations is, as far as I can tell, some structure. There is not much else we can say, really. Also, we should force a structure to only have one grouping mechanism, in the case that we want to do multiple types of grouping, such as for different operators.

Moving on from structure and operations to the individuals that fill up the structure. Again, all we can say is that they have "some structure." In a way individuals may even be worse- at least I was assuming in the above that the populations were homogeneous (with the assumption that multiple groups, as in co-evolution, constitute multiple populations). Individuals can be very complex- linear lists of fix length with a fixed alphabet strings at best, but possibly a tree or a lists containing multiple types of things (bits as well as real numbers as well as symbols, etc). Perhaps this means that individuals are some algebraic datatype, to be deconstructed as needed. This is nice, but very general. Some encodings are very complex in a way, and may include a variety of types of loci next to each other. It would be nice to say "a list of loci" rather than "a real number follow by 8 bits followed by a real number followed by 8 bits..." and so on. So do we have some "evolutionary component"? What properties does this have? Often this is used in the more complex GAs so we can say something like "mutate all loci" and have each loci know what the means for itself. Then we can do crossover over a list of things, without worrying about what they are, as long as they are in a list. Crossover of course has different meaning depending on what structure we are talking about. And there is no use forcing individuals to be structures that have a mutation and a crossover operation because we can have multiple crossovers or mutations for the same structure for the same EA.

Back to the Haskell part, in addition to randomness, all EAs keep track of some changing state. This means we have the random generator as state, as well as some problem specific (method specific) environment. This can include rates (mutation rate, crossover rate, etc) the current population, and anything the operators need that can't be fixed before the start of the EA. This by itself is a problem, as any time we need to keep track of additional information we need to change the structure that we carry around as the state of the EA. Even if we just say it is some environment, this means we have a 2 layer state monad already. Should we add logging, so we can run experiments and write things out? This would give us a pretty complex structure, but maybe that is the best we can do, I really don't know.

And forget about just the data structures, what about evolutionary process? It goes through generations (though I saw a CGA once that performed not in the usual all-at-once style, but incrementally). These generations see the application of some operations that make changes to the population, resulting in the next population. I don't have anything more to say about that at the moment.

Sorry for the rant here, but I needed to get down the ideas I've been playing with trying to come up with something we can generalize and create a nice, usable EA library out of. I may just have to pick some concrete behaviors out and say, "this is the type of EA that this library supports" and hope that it will be usable as well as modifiable, abstract as well as conrete.

Preventitive vs Corrective: Ways of Dealing With Invalid Structures

In evolutionary algorithms, it is common to run into situations where an invalid solution is created through some manipulation of a valid solution, or by random generation. Creating invalid individuals is a problem because their fitness can't be directly evaluated- they must either be given the lowest possible fitness, or repair somehow. Often the EA is constrained to that the form of the encoding of the solution can only create valid solutions. This is what I am going to call a preventative method. There are many techniques that use preventative methods- GAs often do this naturally, though in some cases, such as optimal list ordering problems like the traveling salesman, there is a need to be clever with the encoding (tricky encoding). In GPs, the genetic operations are carefully chosen so that they can only produce valid individuals.

On the other hand, I know some repair operators have been used (I'm not as familiar with GP literature as I would like to be). These are what I am going to call corrective- they take an encoding and make it encode a valid solution if it does not already.

In GEP the original method used a preventative method (the head-tail method), and it was claimed that this saves a great deal of computational effort that is wasted in finding and fixing invalid individuals. It isn't mentioned in the original GEP paper, but the other problem with corrective methods is that they often have to make significant changes to the individual, such that sometimes it is hardly even the same individual and might as well have been created randomly (almost).

The additional complexity added to GEP from the preventative head-tail method of ensuring valid individuals soon was replace (in most papers) by a corrective method where if an operation (including random initialization) would result in an invalid individual, the operation is undone. I call it corrective because the operation must actually be performed and then undone (corrected). This opens up the individuals to encode a wider variety of trees. This is especially nice in PGEP, where it is easy to check if an individual is valid without making a whole tree. The problem with this method is that it requires additional processing to do an operation and check for correctness, and even worse- it requires knowledge of the eventual expression tree to be made available to the genetic operators. This is too hard in in PGEP, but is harder in BGEP or BPGEP.

Other methods have been proposed, such as Unconstrained Gene Expression Programming (UGEP). This allows any change at any place, but has a lot of weird and complex processing to be done, including multiple fitness evaluations. The translation process ends up with 0 or more expression trees, all of which need to be evaluated. UGEP also used karva notation, which is particularity unfortunate considering how the expression trees are built (there is a lose of locality of subexpressions). I think it is actually a very clever idea, though maybe the expressions should have just been linked by some linking operator like the multi-genetic chromosomes in GEP. This may create many very wide trees though, and maybe it is better the way they do it in the UGEP paper.

Naive Gene Expression is another clever encoding, but I'm not convinced that it is worth the restrictions it imposes on the problem. It also uses karva notation, which I don't like nearly as much as prefix notation. It creates a complete binary tree (as operators in NGEP must have < 3 arguments) and fills out the tree karva style. This means that it always fills out the tree, but some parts of the tree are not used. It is an interesting idea, and essentially uses the same head-tail method of GEP to impose validity, but ultimately there is little gain for such a loss in generality.

I'm going to be proposing another corrective method that I think is very nice. It allows all operations to be performed, requires no knowledge of expression trees in the operators, is simple and fast, and preserves substructures nicely. The last part is particularly interesting, as I mentioned before that some attempts to repair are very disruptive. This method only repairs when expressing an individual, and never has to actually edit the individual itself- allowing any genetic variation at all to exist. It does a simple single pass through the linear encoding, removing some operators, and "floating" subexpressions up through the tree until the tree is complete. It has a nice biological justification in RNA editing, and allows some amount of neutral mutation by expanding non-encoding regions to inside the gene or at the end (one or the other).

I've never seen a comparison of the different validity enforcing methods, though probably there exists one for GPs. I hope that this will be one of the contributions that my masters thesis makes to GEP research. In another post I will describe the actual mechanism that I am proposing.

There are other things I would like to add to GEP. My thesis so far is about making it as simple and expressive as possible, but I would love to see some of the great stuff the GP community does attempted in GEP. Some of these things are recursion, iteration, strongly typed expressions (removing the "closure" condition in GEP), and the use of lambda calculus. Since one of my main interests is in the issues surrounding functional languages, and PGEP has already given us a way to evolve functional expressions (compare PGEP to most linear GPs, or even to karva notation), it would be great to carry the torch further and make the expressions even more closely related to functional languages.

Wednesday, October 6, 2010

Cellular Autamata on an Infinite Surface

Imagine a single cell in a huge grid of individuals, such as in a CGA (cellular genetic algorithm). In one generation the cell may crossover with any of its neighbors, but that is the limit of its involvement in the grid (barring some extra stuff going on like long distance links between cells or something). After two generations its genetic material may have spread a little farther, and it may get some material from a cell one more level away from. After 100 generations it will have interacted with cells 100 steps away.

But what if the grid is so large that each cell has some other cells that it never interacts with, even indirectly? What if the grid was infinitely large? In a non-strict language like Haskell we could even implement such a thing. We could start with some single cell in the middle (probably this would be a pointed grid, so we elect this "middle" cell to the the individual we focus on) and only instantiate (randomly) cells that we need to evaluate this cell each generation. After some number of generations, it will have forced the evaluation of a large space around it, but there will still be this infinitely of space- representing a grid of infinite interactions of genetic material. We could stop after 100 generations, say, and then search this grid forever, keeping track of the best individual we find. It would be an infinite space of solutions, with patches of fit ones that out competed the ones around them.

Admittedly there would be problems holding this in memory, as well as problems getting good individuals to evolve, with so much random material out there to interact with and no way to bound interaction except to out compete it (which is kind of cool by itself).

This is just some idea I had in class today, but I thought it was fun enough to post about. Could be fun to come back to one day, even if it never amounts to anything.