Monday, November 29, 2010
Monday, November 22, 2010
To fulfill this goal I've chosen to keep the binary encode, with the change that sequences of bits that do not encode a symbol are thrown out in the gene expression phase, and the mutation, crossover and notation from PGEP. I've replaced roulette wheel selection with 2-tournament selection. This works by choosing 2 (or more generally k) random individuals from the population and having them compete in a tournament. The tournament consists of choosing the best individual with probability p, the second best with less probability, the third (if there is one) with a smalled probability, etc. This is basically a roulette wheel on a ranked subpopulation. I've changed the selection because there is no convincing reason to use roulette wheel (the original GEP paper chose it with very little justification), tournament selection is apparently popular in GA literature, it removes the dependence that selection in roulette wheel has on the scale of the fitness function, and it can support both minimization and maximization problems. Roulette wheel only supports maximization, and so the fitness function is often changed to turn a minimization to a maximization. This is messy and may have unintended consequences. Note that I'm keeping elitism- it seems to help a lot.
The other change is in the editing mechanism. I don't want to have to check for invalid individuals after each genetic operation, and I consider it a problem with implementation that they have to pass that kind of information to the genetic operations. I would rather keep in the spirit of GPM systems and simply do editing to ensure that individuals encode trees. To do this we can just remove the noncoding region, flip the individual, and do a postfix evaluation where operations that would cause a stack underflow are ignored. There is another way to do this by moving backwards through the list but the postfix interpretation is very nice and concise.
I like this editing mechanism because it doesn't introduce any new material to the individual. In fact, all the editing that is done- bits to symbols, removing the noncoding region, and the postfix traversal- will only remove elements, which I consider a nice consistent approach. The only problem with this is the possibility of removing all symbols and ending up with an empty individuals. I don't like this, but I am willing to say that these individuals just end up with the worst possible fitness.
Thats the main points I'm making here- I also investigate other notations and editing mechanism to explain why they are not satisfying. I would like to also investigate adding some of the features that different GP variations have, like typed programs, recursion, iteration, and lambda terms that allow it to solve problems that are otherwise difficult for GPs. I've read, and I agree, that a huge amount of power is to be had by adding nesting and program reuse in evolved programs. This would be a test of RGEPs simplicity in the sense that if it is simply it may be easy to add to. On the other hand, it may be to simple to easily add to, and I expect a lot of complexity will be added to make these kinds of modification. I doubt I will be able to do more then one, so if I could find a nice way to remove the closure property I would be pretty happy. I expect that many of the variation of GEP that already exist are easy to add to RGEP, like ephemeral constants.
Friday, November 19, 2010
For my evolutionary algorithms framework, which I'm starting again for like the 4th time, I'm looking into ways of creating DSLs. I would like to be able to say that a GA consists of initialization, and a loop involving evaluation, selection, mutation, and crossover. Then I want to be able to say just as clearly that PGEP is initialization, and a loops involving evaluation, selection, point mutation, rotation, 1 point crossover, and 2 point crossover. And I want to be able to specify the type of selection, and make the evaluator easy to add. In addition, there are some aspects of EAs best handled by lots of state, which is a problem. Its not that I could use a state monad (I am for randomness already) but it would be complex and the type of state would change for nearly every experiment.
My current solution is probably the one I will continue with, and I want to describe the advantages and disadvantages to this and a couple other DSL strategies. The first strategy I looked into was a data definition, the kind you see all the time in Haskell blog posts.
data EAExpr p = Eval EAExpr | Select EAExpr | Mutate EAExpr | Loop Int EAExpr EAExpr | Crossover Int EAExpr | Population p
or something like this. Note that we may want to define it differently and take the fixed point of a data definition to get exactly the right structure, but since this is not the route I'm taking I'm going to skip over that. Then one would supply an evaluator for the problem you were solving. The problem is that the language needed may change depending on the problem and the specific EA. I could try and make a very generic one, but I'm not sure I would gain anything by it. The main thing with data definitions is that you can't extend them once they are set up. Or can you?
There is a way of constructing languages embedded in data definitions that can then be composed using the operators of combinatorial species, see "Datatypes a la carte." The problem there is that it is fairly complicated and I'm not sure I really need that level of complexity right now. Instead, I'm going the route of making many classes, many of which have only one function, and making specific conglomerates of theses. For example, the GAPop, a genetic algorithm population class, has no new methods, but must be a mutatable, crossoverable, selectable, and evaluatable thing. This lets be define some basic structures and make instances for them, but to leave the language ultimately open. I can add new structures and their instances any time, which is a lot of work but I think its the price I pay for lack of specific knowledge about applications, and I can add classes (functions in the DSL) easily. Then I can describe default and specific instances of algorithms, like a basic GA that can be filled in with a pop, evaluator and some parameters. I even get the property that, given the class defs involved in an algorithms description I can determine the kind of statements that might be made.
In a way this is hardly a DSL- its just a lot of classes and datatypes glued together by monad syntax. I'm going to think of it as a DSL though, and try to get it as nice as possible, so I can describe exactly what I want and not need to supply lots of unrelated information. And isn't that the point of DSLs, to be able to describe what we want the way we want to say it?
Problems I anticipate include overlapping instances and unwieldiness. I'm still researching these methods in my spare time to see if there is some problems with this approach that I'm not aware of that inspire people to try all the different strategies you see talked about.
Tuesday, November 16, 2010
There are several primitive types, which all others are build out of by combining primitive types with one of the operators. These types are not the normal types one thinks of (int, string, etc) but are rather the building blocks out of which structures are created. It is these structures that can be thought of as holding a concrete type (this is made explicit in species theory).
The first primitive type is 0. For all sized structures it is not a structure at all. It never holds data or does much of anything. Then we have 1. 1 has no place to hold data, so it is like (very much like!) nil or () in Lisp- it denotes an empty container. The rest of the natural numbers are defined similarly, with 2 being essentially booleans, 3 being taking three possible values but holding no data elements, etc.
The next primitive type is E, the type of sets. There is only one set of each size, as sets are unordered, and so "permuting" them makes no sense.
C is the cycles, which are like linked lists where the end node is linked to the front. They are unique up to rotations.
Then we have X. X is a hole in which to put data. Its kind of like a variable- we don't at the time we write it know what it will hold, but we know it "holds" something.
To combine these types we have many operators. I will only talk about three right now- +, *, and o. The "o" is meant to be composition, like the composition of functions. The sum operator means that the resulting structure can be either one thing or anything thing. For example, 1 + X is exactly the Maybe type in Haskell- it is either a placeholder (1) or Just an element (X). This is related to taking a disjoint union.
The * operator pairs things together. Where + is like the Haskell Either type (which is how we express + on values) * is like ",", the pairing function. Note that in Haskell when declaring datatypes we can have multiply constructors, if desired. Since they are a tagged union of types they correspond to the + operator on species. Then we have * as the fact that a type constructor can have multiple values- they are really tagged tuples. An example is 2*X*X, which we can interpret as a pair and a index into the pair, which distinguishes one value over the other.
Interestingly we can write, for any n, 1 + 1 + 1 ... + 1 n times. Similarly, X*X*X = X^3. Many such properties work out in this system. This means that what we end up with are types of the form c1*X^x1 + c2*X^x2 ... + cnX^xn, which are polynomials! This means that (as far as I know) Haskell's algebraic datatypes are polynomials in the theory of species!
For the concerned reader, note that this isn't really trues for many different reasons. For one, Haskell allows infinite types, I haven't talked about how to include least fixed points, and I'm not sure if some Haskell type extensions don't have neat analogs to the theory of species.
Also notice that we haven't talked about two of the primitive species! The reason is that cycles and sets are "non-regular" in that they can be modified in a certain way- we can take a renaming of one- without really changing it. They are not part of the Haskell datatypes, which are all regular.
I hope that this is at least a tiny bit helpful to someone trying to learn about differentiable datatypes (which are amazing!) or any of the other topics surrounding datastructures that use this notation. I hope to write some time about the other operators- composition, functorial composition, differentiation, pointing, and cartisian products, as well as maybe some of the categorical explanation of what is going on. I am also interested in some of the other parts of the theory, like extending the (U, +, *, 0, 1) commutative semi-ring with unity to a ring with unity (additive inverses are called virtual species) and adding multiplicative inverses in order to express certain things. There are even further extensions, such as a generalization to structures holding many types of data (all of these have kind *->*) called k-sorted species, as well as "weighting" species to express more complex properties of the structures we are describing.
Friday, November 12, 2010
One of these things is to have types that take fixed points of other types. This seemed very strange to me at first, and I've only just now understood the difference between the greatest and least fixed point of a type declaration. Thank you "A Neighborhood of Infinity" for being such an enlightening source of math and computer science knowledge!
As I understand them, polynomial types are of the form c1*X^n1 + c2*X^n2 + ... cN*X^nN, ci >= 0, ni >= 0. Since Haskell allows multiple type constructors, separated by pipes (|), the type systems admits "sum" types. A sum type is either one of the type on the left of the + or one of the type on the right. Each constructor (where the name is the tag, making the sum a disjoint (tagged) union) is isomorphic to an n tuple. This is a product type, where a product type is a pairing of one of the thing to the left of the * and one of the type to the right of the *.
It is interesting to note that we can also do composition of types, such as [(a,b)], where we are composing list with a product of types a and b. This means that the only operations from the theory of combinatorial species that Haskell does not have are differentiation and functorial composition. Of course, there are libraries for the creation of these kind of types, but I'm specifically talking about encoding them directly in Haskell as types, not as data.
One more thing before I get to the point of all of thing is that the Either type in Haskell is a value level sum, and comma (,) is a value level product.
So! Fixed points! If we consider the list type LA = 1 + X*LA, which is lists over a type A, we may want to solve for LA. What finally cleared up the idea of greatest and least fixed points for me was that this type can have two meanings, one which is "smaller" than the other. The "smaller" meaning is that of finite lists- a list is either an empty list or a thing paired up with a finite list. The "larger" meaning (the meaning that admits larger values) is that a list is either empty or a thing paired with a possibly finite list. This allows infinite lists (which is what Haskell means by a list). I was reading a paper by Wadler that make use of the idea of the least fixed point, and I completely missed out on the fact that he was using this idea to make sure that all the structures that he was talking about were finite.
There is one more thing that is interesting to note about this idea. In Haskell we can legally write this type:
data mu p = In (p (mu p))
I still don't know why they use mu to denote this "least fixed point" type, nor why the type constructor is called In (is it related to the in operator used to come out of a functor structure in a F-algebra, which is given the same name? I think it might be). The point here is that we then define something like this:
data Val a = Val Int
data Add e = Add e e
We can then create a sum type out of Val and Add, and create an Expr type mu Expr which takes the least fixed point of the sum type. They we can create an evaluator for this type. The nice thing about working this way is that we can construct types out of other types, and be explicit in the type signature about what sorts of things we admit in the resulting type. This is all describe in the paper "Data Types a la Carte" by Wouter Swierstra.
Okay, that is enough for now. I am still learning this material, and it is way beyond me in many ways, so there may be abuses of terminology and mistakes of understanding in the above post, so beware!
But this is just one small part of the much larger theory being used here. The basic idea (as I understand it) is that in the field of combinatorics it has long been known that generating functions can be used to solve combinatorial problems. It is possible to perform operations on the simple, well known, series and find solutions to complex problems. There are quite a few operations, including a sum, product, composition, functorial composition, cartisian product, and differentiation. It turns out on the other hand, that we can apply operators to datatypes, as written in a particular notation, and get the structures involved in a problem, while simultaneously performing the analogous operations on the datatypes associated generating series, and both will tell us about the same problem.
I'm being very vague here about what exactly is going on with this theory, but I hope to post more about it, and hopefully figure more out myself.