Monday, December 31, 2012

Tropical Semirings and Time Complexity

I've have been thinking about how a compiler might track the cost of a computation, and assuming that it is known to terminate, compute it at compile time. I have no idea if this or something like this is usually done, but its basically constant folding.

What I was thinking is that many operations, arithmetic operations being the best example, can be tagged with a cost (say, 1 for all of the languages primitive operations). Any operation that takes an unbounded amount of time would be tagged with an "infinite" cost. The interesting thing here is that this amounts to a morphism from programs in some language to the tropical semiring (in this case, of natural numbers extended with infinity, with the additive operator the max function, and the multiplicative operator as addition), as I show below. Note that this assumes that they available control structures are sequencing and conditionals.

First off, primitive operations are assigned a fixed cost- perhaps just 1 for each operation. This could be more fine grained if desired, especially if the languages primitive operations are not simple.

Sequencing two operations translates to the addition of them. This seems appropriate as sequencing is like the product of programs. This is because the cost of doing two operations in sequence is the cost of doing the first operation plus the cost of doing the second.

Conditionals have at least two branches, as a one branch conditional statement is like a two branch condition with the empty operation as the second branch. Only one of the conditions will be executed, but we have to assume pessimistically that the more expensive one will be followed. This means that the cost of a conditional is the maximum of cost of the two branches.

Other control structures could perhaps be accommodated. Well founded recursion, or loops of certain forms, could be added, though in general they would have infinite cost. It seems like one would end up evaluating the largest expressions possible that is not tagged with an unknown (infinite) time cost.

So thats all. A trivial observation, and one that I'm sure is well known. To finish this off, note that something like this is also true for determining the size of algebraic data structures. Infinity comes up for structures that may have unbounded size (lists or trees), but otherwise the equations that define the structures (in the algebra of data types), when interpreted as a number is essentially its size, with variables acting as expected.

Monday, December 3, 2012

Robust Gene Expression Programming + Let Expressions

I've been thinking about (and implementing) Robust Gene Expression Programming recently. I've posted about this technique before, but as a refresher it can be thought of as either a Genetic Algorithm that evolves expressions in a given (unityped) language or as a postfix form of linear Genetic Programming (with the intent that the language is purely functional rather than imperative as linear GPs sometimes are). One of the things I noticed while developing this algorithm was that there is no reason one couldn't include stack manipulation operators into the expression language, regardless of the language one is interested in, and more importantly these operators add power to the algorithm essentially for free. Recently I've realized that "let" expressions could be similarly added, and that they may improve the algorithms ability to find useful programs in a nicely clean way.

Adding stack operators and "let" as an extension to a language is motivated by the observation that useful programs consist of useful subexpressions. This means that program with good subexpressions should propagate throughout generations allowing them to get mixed into individuals and so on. There are several schemes in the Gene Expression Programming world for helping useful subexpressions get reused. I tend to like simple schemes when dealing with these algorithms, and I think it is pretty cool that a stack operator like "dup" can be added to, say, a language for arithmetic expressions to duplicate it in the resulting expression tree, allowing it to be reused. Something like "tuck" could perform a similar function of duplicating subexpressions, and for allowing the expression tree to be essentially edited to move subtrees around and combine them in different ways.

Now for the point of this post- in addition to adding stack operators I think one could easily add "let" expressions to any expression language. Again we get them essentially for free in the sense that we can add them without changing the type of the expression language (they can be expanded out before the expression is evaluated for fitness). The scheme I have in mind involves adding variables to the terminals of the language and, for each variable added, a "let" operator for that variable taking two expressions and replacing the first expression for all occurrences of the variable in the second term. Variables that occur free in an term (not bound by a surrounding "let") are ignored during evaluation so they do not disrupt the usual evaluation (which is set up to handle problematic cases like this and still result in a valid expression).

Note that I've been calling these "let" expressions rather then lambda because they never form actual functions but must rather be immediately applied to a value to be included in the expression tree. One could do otherwise by designing an expression language with lambda terms, but as an extension to a language we must enforce the rule that each expression has the same type. Since there is only one type, and the "let" terms are immediately evaluated, these are more like "let" then lambda.

The main problem with this idea that I see is the somewhat ad-hoc way we have to decide beforehand how many variables we want to have. I don't have a big problem with this, and I think for many problems a small number would be sufficient to add some value, but it does feel a bit odd. On the other hand, this strategy is much simpler than some other ideas that have been looked at, and sometimes one exchanges simplicity for perfection.

It will probably be a while before I get around to implementing this to see if it actual helps at all, but I like that RGEP seems to have some simple, optional, and lightweight ways to extend itself without changing the core algorithm. Now if I could only figure out a clean way to allow more then one type...