Friday, November 19, 2010

Haskell DSLs

Domain Specific Languages are a big deal in Haskell, despite the lack of a macro system or some mechanism to actually modify the language itself. This is understandable- many problems can be expressed in languages, and sometimes we need a new one that model's our problem domain.

For my evolutionary algorithms framework, which I'm starting again for like the 4th time, I'm looking into ways of creating DSLs. I would like to be able to say that a GA consists of initialization, and a loop involving evaluation, selection, mutation, and crossover. Then I want to be able to say just as clearly that PGEP is initialization, and a loops involving evaluation, selection, point mutation, rotation, 1 point crossover, and 2 point crossover. And I want to be able to specify the type of selection, and make the evaluator easy to add. In addition, there are some aspects of EAs best handled by lots of state, which is a problem. Its not that I could use a state monad (I am for randomness already) but it would be complex and the type of state would change for nearly every experiment.

My current solution is probably the one I will continue with, and I want to describe the advantages and disadvantages to this and a couple other DSL strategies. The first strategy I looked into was a data definition, the kind you see all the time in Haskell blog posts.
data EAExpr p = Eval EAExpr | Select EAExpr | Mutate EAExpr | Loop Int EAExpr EAExpr | Crossover Int EAExpr | Population p
or something like this. Note that we may want to define it differently and take the fixed point of a data definition to get exactly the right structure, but since this is not the route I'm taking I'm going to skip over that. Then one would supply an evaluator for the problem you were solving. The problem is that the language needed may change depending on the problem and the specific EA. I could try and make a very generic one, but I'm not sure I would gain anything by it. The main thing with data definitions is that you can't extend them once they are set up. Or can you?

There is a way of constructing languages embedded in data definitions that can then be composed using the operators of combinatorial species, see "Datatypes a la carte." The problem there is that it is fairly complicated and I'm not sure I really need that level of complexity right now. Instead, I'm going the route of making many classes, many of which have only one function, and making specific conglomerates of theses. For example, the GAPop, a genetic algorithm population class, has no new methods, but must be a mutatable, crossoverable, selectable, and evaluatable thing. This lets be define some basic structures and make instances for them, but to leave the language ultimately open. I can add new structures and their instances any time, which is a lot of work but I think its the price I pay for lack of specific knowledge about applications, and I can add classes (functions in the DSL) easily. Then I can describe default and specific instances of algorithms, like a basic GA that can be filled in with a pop, evaluator and some parameters. I even get the property that, given the class defs involved in an algorithms description I can determine the kind of statements that might be made.

In a way this is hardly a DSL- its just a lot of classes and datatypes glued together by monad syntax. I'm going to think of it as a DSL though, and try to get it as nice as possible, so I can describe exactly what I want and not need to supply lots of unrelated information. And isn't that the point of DSLs, to be able to describe what we want the way we want to say it?

Problems I anticipate include overlapping instances and unwieldiness. I'm still researching these methods in my spare time to see if there is some problems with this approach that I'm not aware of that inspire people to try all the different strategies you see talked about.

No comments:

Post a Comment