Thursday, March 31, 2011

Learning Genetic Algorithms

I plan on posting more about the lambda calculus, perhaps showing its connection to the always amazing functional programming paradigm. For now, though, I want to also start a series on my other favorite topic in computer science- Genetic Algorithms and the related methods Genetic Programming and Gene Expression Programming (which is the subject of my thesis).

To understand Genetic Algorithms we need a little bit of background. One important part of the field of Artificial Intelligence is methods for solving certain types of very difficult problems. It is not that these problems are difficult to describe, or even that we don't know what good solutions look like, but rather that there are so many possible solutions it is difficult to find the best solution (or even a pretty good solution).

This doesn't at first seem like its related to intelligence at all. The connection is that while tradition methods rely on complex math, AI techniques for problem solving rely on partially random but directed searches through the solution space. A solution space is the "space" or the set of possible solutions. In the case of a function of two variables, for example, it would really look like a space as it could be visualized as a landscape with hills where there are good solutions and valleys of poor solutions (or the reverse, it depends on if we are looking for the smallest or largest value of the function). The way the search is directed makes it seem intelligent in a way- it should be designed to spend more time looking at solutions similar to ones that its seen that are good while still searching as much of the solution space as possible.

These methods will often try to mimic a natural phenomenon that appear intelligent. This can be the swarming behavior of insects, the neurons of the brain, or, in our case, the force of natural selection. The important thing isn't actually to study intelligence (obviously selection is not really intelligent) but rather to study selection as a guiding force that produces complex structures that are adapted to their environment. We will start out much simpler than this at first, though.

The simplest method of this type would be to construct a random solution and see if it is any good. We could generate many random solutions and just remember which was the best. For example, imagine we wanted to find the input value for some function, f(x), for which it returns the largest result, and we are interested in values only between 0 and 10. We may know what the function f looks like, but we might not- it might be a "black box" that we put values into and get values out. If we generated 100 random numbers between 0 and 10, perhaps one would be very good, perhaps not. Maybe one time we try and we get a very good result and the next time we get a much worse. There is no guarantees and no consistency, but on the other hand this is a very simple method.

A slightly better approach is this- generate a random number as before, but then try to increase it by a little bit and see if that gets a better result. Then try to decrease it a little bit, and see if that is better than increasing. If we keep trying to increase or decrease, and we move which point we are at depending on which direction (increasing or deceasing) is better, than we will often get better values than if we just generated number randomly. The problem, however, is that we are just climbing to the closest good point- if it would be good to go down a little bit in value to find a place were the result is better than the current than this method will not be able to solve the problem. This method is called hill climbing as it is like climbing the nearest hill in the solution space (the closest maximum value).

There are many problems that have the right characteristics for this type of search- as long as the solutions can be written down in some nice form (like as a number, a list of numbers, a list of symbols, etc), we can measure how well a solution solves the problem, and there are many possible solutions (to many to try them all). Often solutions that are similar in their representation (in terms of numbers, solutions that are close to each other on the number line) perform similarly as solutions to the problem. Unfortunately this can mean that there are many groups of good solutions and it is hard to find the best solutions among them. In general it is difficult to know anything at all about the solution space. What we want is a method for search any type of solution space that does not make any assumptions about how the solutions are spaced, or anything else about the problem.

In the next post we will look at encoding solutions as lists and introduce the biological metaphor that guides the field of Genetic Algorithms.

Wednesday, March 30, 2011

Program Semantics

Part of learning a programming language is developing a mental model of the semantics of the language in order to reason about expressing ideas using the tools that the language makes available, and when reasoning about the operation of programs. In an imperative language a reasonable model is an of abstract machines (the meaning of a program is the operation of the abstract machine), which declarative languages get a semantics from the system they are based on (logic or a lambda calculus). The meaning of a declarative language is in the language itself, not the result of the compiler, as discussed here:

This post will go over my personal abstract machine for some languages I've used- YMMV.

In C the code itself is in some untouchable "nowhere" and it dictates changes to the two parts of memory- 1)the stack, which is intimately tied to the code and directs its control flow, and 2) the heap "out there" where any amount of memory can be taken. The programmer has only indirect access to this stack, directing it but not dictating its control. There is no concept of a function or a conditional statement (or any other control structure for that matter) but rather the idea of the machine code that produces these effects. Functions are just jumps that effect the stack, and the jumps for all control does this "jump" by moving the mental cursor around in the untouchable "real code" somewhere out there.

In Java memory is the big pile of objects with connections to each other forming a huge web. Control flow moves not in one place as in C (the stack) but from object to object. This is something like a cursor, and having multiple threads is like having many cursors. I realize that this is not true, and I certainly don't reason like this while programming- I'm just investigating what I see as the language's notion of the space in which its programs live.

One could think of the JVM when imagining Java code's runtime- and I often do this- but, for example, for a beginner programmer with no grounding in the strange mixture of language semantics, virtual machine semantics, machine code semantics, and physical machine semantics that a Java programmer must contend with, the language may seem like a huge open space of objects floating around with directions to other objects.

In Forth there is the built-in dictionary, extending off into the user dictionary. The program's execution is so direct there is really no abstraction- the only thing that floats off in the untouchable "somewhere" is the actual hardware of the registers and the ALU, etc. Control flow occurs in a read and writable part of memory (the rs) and the main part of the program is concerned with the change of the special, separate ps. The stacks and the scratch pad memory are conceptually separate from the dictionary.

Python and Ruby seem more like Haskell to me in the sense that I don't think of the machine model as much. I think this is partially because of the use of higher order functions- it lifts me from the machine concept to a more mathematical one. On the other hand, the objects are similarly connected by a web of references, and the control flow does involve a movement through this web.

It took me some time to begin to understand my mental model of Haskell code, but it seems like I see it as being a series of rewrites doing beta-reductions. This is interesting, because this is a pretty close model to what "really" happens in the sense that the core code is an extension of the polymorphic lambda calculus. Sometimes I think about the STGM (Spineless-Tagless Graph-reduction Machine) when thinking about data structures, but for the most part I think of Haskell in a very high level mathematical way.

So, there you go- some rambling thoughts about a programmers mental model of languages. Read the link in the beginning of the post for a more interesting discussion of this topic.

Saturday, March 26, 2011

Learning the Lambda Calculus, simple types

The simply typed lambda calculus adds a notion of type to the terms of the LC. Each term is given a type, and terms that cannot be assigned a type are forbidden. This restriction gives us a system with a couple of particularly nice properties. Perhaps the most important property is called strong normalization. In previous posts we saw that we could turn terms into other terms using the three rules, alpha, beta, and eta. If a term has no parts that can be reduced with beta and eta reduction than the term is said to be in (beta eta) normal form. When trying to determine if two terms are equal we would like to be able to reduce them to a single form and then just check if they look the same. The problem, however, is that not all terms in the untyped lambda calculus have a normal form, as we saw when we tried to use the Y combinator. Restricting terms to those with types we can get a system where every term has a unique normal form. For completeness sake I want to mention that when I say unique I mean unique up to alpha conversion, so the terms (\x.x) and (\y.y) are the same because they can be converted into each other with just renaming of bound variables.

If a system has a unique normal form for every term then every computation completes in that system, as the notion of computation in the LC is finding the normal form. Even better, when we talk about "strong" normalization we mean that it doesn't matter which sequence of reduction we do, we will always end up with the same form. Doing the steps in different orders has important consequences in practice, and is the essential idea of laziness in programming language. I hope to talk about laziness and non-strictness in a later post.

So, on to the actual types. To give a type to a term, we must have a type context. I'm not going to go much into this concept, and I will informally assign types variable names like a, b, c, ... and variables names like x, y, z, ... to help distinguish between them. Notice that types are just like assumption in logic, and a type context is just a set of assumptions.

Lets look at a simple term- (\x.x). This term takes an argument and returns that argument. If we knew that the variable x had to be of type "a" then we would say that (\x.x) : a -> a where ":" is pronounced "has type" or "is in" (if we think of types as sets). We shouldn't use variables without giving them types anymore, so really that term would be written (\x:a.x) : a -> a. The type of the input is "a" and the result is also of type "a" so the result is a function of a to a. The arrow is a connective in that it takes two types and creates a new type out of them- the type of function from the first type to the second. If we assume that we have integers and booleans then the type integer -> boolean might be the type of a function called "even" or "odd" or "equalsZero" or anything that takes an integer and results in a boolean value.

Notice that I said that we assume that we have a type "a" before giving the type of (\x.x). This is because that term, the identity function, doesn't just have one type, but rather there are an infinite number of terms of the same form with different types. In other words, for each type there is an identity function that takes a value of that type and just returns it unchanged. This is a little bit unfortunate as we would like that the term's type reflect that it is essentially the same for every type. In the simply typed lambda calculus we simply don't have the ability to state such a thing, and we would have to add something (universal quantification) to gain this ability.

We have seen that a term can have the type a -> a, but what other types are there? Well, each type variable (a, b, c, ...) is a type, and for any two types a and b, a -> b is a type, and there are no types besides these. If we take "a" to be (d -> c) and b to be "e" then (d -> c) -> e (given a function from d to c return a value of type e) and e -> (d -> c) (given a value of type e, return a function from d to c) are certainly types.

In addition to these rules specifying which types exist, there are rules for giving types to terms. Given that x : a (x has type a) and y : b we can deduce that (\y. x) : b -> a, as the input is of type b and the output (the variable x) is of type x. Just so we are clear here, there is some subtly going on here with type contexts and substitution that I am completely ignoring.

Given a function (\x:a. y) where y : b, and a value z : a, the type of (\x:a.y)z is of course b. Applying an argument of the correct type as the input to a function gets a result of the same type as the result of the function.

In the Y combinator post I gave Y the type (a -> a) -> a. I was being very informal, as Y cannot be given a type in the simply type lambda calculus, which avoids the associated contradiction.

Unfortunately the simply typed lambda calculus is not very expressive, and I think we will just move on to the polymorphic lambda calculus aka the second order lambda calculus aka System F. This brings us to just about the type system of Haskell and introduces a huge amount of interesting material. Because this series is pretty informal I think we will just cover these topics very lightly, as it gets harder and harder to go on without needing some more rigorous discussion to talk about interesting things.

Friday, March 25, 2011

Learning the Lambda Calculus, church numerals

In the first post in this series I wrote a lambda term (\xy.x+y) and promised I would make sense of it some other time. Well, the time has come! The reason I want to make this clear before moving to other subjects is that it is important that the LC consists entirely of the things we already defined- variables (x, y, z, ...), lambda abstractions ((\x.E) where E is a term and x a variable), and applications (E1 E2 or E1(E2) where E1 and E2 are terms and putting them next to each other means application). Even though there are variables, it is perfectly valid to say x(\y.y) or something like this, where the "function" in this term is x, and the argument is (\y.y) as its really functions on functions that return functions of function with functions all the way.

So, for something like x+y to make sense we would need a way of expressing number (natural numbers) with functions. We want them to have the same characteristics as our usual notion of a natural number: there should be one number called zero and a function that, if given any natural number, gives the number's successor. Notice that the whole of the natural numbers can be defined this way, and there is a clear connection with induction with zero as the base case and the successor map as the recursive case.

So, lets start with zero, defining it as (\f.\x.x). This term takes a function and returns a term that takes an argument and applied the function to the argument 0 times. This may make more sense if we look at the definition of 1- (\f.\x.fx)- which applied the function once, and 2- (\f.\x.f(fx))- which applies the function twice. I mentioned a successor function, which we expect to turn (\f.\x.x) into (\f.\x.fx) and must look like this: (\n.\f.\x.f(nx)). This function applies the function f to the result of nx which is f applied to the argument x n number of times, so the result is x with n+1 applied to f n+1 times.

So, back to the original term (\xy. x+y) we have to define addition. Well, this is simply (\n.\m.\f.\ assuming n and m are numbers encoded as shown above. We can even give multiplication just as easily as (\n.\m.\f.\x.n(mf)x) which applies m exactly n times giving a total of n*m applications of f to x.

While the point of all of this was to clear up something in the first post in this series, there are several important topics that this brings up. What happens if we take the successor function and give it the multiplication function as its argument? It is certainly a valid lambda term, but does it make any sense? This is a fundamental problem with untyped systems- we don't have a way of expressing meaning and we can't restrict ourselves to only meaningful operations. Another point that I will not develop is that the ability to express simple arithmetic is a hugely important property in terms of computability.

While we are thinking about encoding things in the untyped lambda calculus, lets look at the expression of True and False, if statements and pairs. True is defined as (\x.\y.x) which may look familiar as it is simply the combinator K. False is (\x.\y.y) which is again familiar as it is just the encoding of zero. The reason that these definition make sense is that if we define "if" as (\b.\x.\y.bxy) then writing (informally) if false x y will reduce to x and if true x y will reduce to x, which makes it very much like branching based of off a condition that returns true or false.

Pairs are just as easy to encode- (\x.\y.\s.sxy) where x is the first thing in the pair and y is the second. If we put something in a pair- (\x.\y.\s.sxy)ab =beta (\s.sab) then we can get the first thing out by applying as the argument s the "fst" function (\p.p(\x.\y.x)) and we can get the second thing out by applying the "snd" function (\p.p(\x.\y.y)). Notice that fst = true and snd = false.

Having seen that we can encode interesting things in the untyped lambda calculus and further seen that it can lead to the problem that we can't enforce that the encodings are used in a meaningful way, I think we are finally ready to move on to type systems as I have been promising for a while.

Good Static Type Systems

What makes one static type system better than another? Why is Haskell's so very helpful and useful and Java's so painful and useless? We can approach this from a fairly rigorous standpoint by rooting the discussion in type theory and the associated logics. Warning- this post is long and ranting.

Type systems for programming languages are formalized by type theories over the lambda calculus. Types organize the otherwise untyped lambda calculus and present a rigorous setting to study type systems. A type theory corresponds to a system of intuitionistic logic where programs are proof in this logic and the proposition they prove is the type of the program. Since there is such a strong link (an isomorphism) between type theory and logic, then we would expect it to show up in the type systems of the languages we use. This is where I want to focus the discussion.

In logic there are a series of connectives we may encounter such as disjunction ("and", & , /\), conjunction ("or", |, \/), implication ("implies", ->). We may also have quantifications such as the universal quantifier (forall) and the existential quantifier (exists). Lets see how there show up in programming language and their type systems.

Lets start with implication, which corresponds to functions. Does Java have functions? Well, not really. It has something that look like functions, but without referential transparency, higher order functions, and functions as first class objects they aren't much like functions at all. Haskell has all of these properties. Implication is perhaps the most basic connective, and I think functions as first class objects are one of the most important things a programming language can have.

What about conjunction? This corresponds to pairs and, more generally, n-tuples. It could be added in some form to Java, but not having it built-in has an important consequence when we get to pattern match later on in this post. I Haskell we have the algebraic data types, whose constructor are products of types. We also have tuples with the usual notation (1, "s") for a pair of type (Integer, String).

And disjunction? This is a tagged union type. If anyone is wondering where I am getting the interpretations of the logical connectives as data structures, it is well worth looking into the Algebraic Data Types and the Theory of Combinatorial Species as well as Haskell. Sum types appear in Haskell with constructor names as tags. As with tuples, we also have a data representation of sum types- the Either type. We may have, for example, a Left 1 or a Right "s" from the Integer + String type (normally written "Either Integer String" in Haskell).

Negation seems like it would cause some problems here. What is the negation of a data type, and what would it look like? Well in the logics that type systems are related to, the concept of negation is just this: for the false value _|_, the negation of A is A -> _|_, read as "A implies false." This doesn't seem terribly useful in practice, and I only mention it because it shows a connection between a translation of statements in classical logic into intuitionistic logic by double negation with continuations.

Finally, what about quantification? Java does have generics which give a way of expressing the parametric polymorphism that the universal quantifier adds to a language. Unfortunately the most important relation Java does give us, the order relation of subclassing, does not play well with generics. This is very unfortunate. In Haskell, parametric polymorphism is ubiquitous and comes naturally with nice syntax.

Speaking of nice syntax, one obvious difference between these two type systems is that despite (actually exactly because of) being more powerful, Haskell's type system allow type inference. Java 7 is supposed to allow some amount of inference, but from what I've seen it is very, very limited.

A very common proof method in the lambda calculus and in logic is proof by induction. If we can prove something for the base cases and take apart the connectives we can prove something about terms. Programs are proof, but don't forget that proofs are programs- Haskell's pattern matching allows the programmer to express the most natural proof method in these systems by induction on the structure of data. Pattern matching is a hugely powerful thing to be able to do, and like higher order functions, I have started to miss it dearly when I use most languages.

So, Java's type system doesn't appear to support any of the usual structures found in type theory or logic, so what does it have? It has generics, and it has subclassing. Certainly such a relation is expressible by Haskell's class system, and you can even get the inclusive polymorphism that object orient programmers love so much. So what does Java's type system have going for it? Well, it could not have an ordering relation or generics, so it could be worse. The problem is that it doesn't include the most basic ways of creating structured data and must rely on using subclassing to express almost everything. Static typing done this way still prevents many types of problems, but it is very understandable for programmers to find dynamic types and partial structural types liberating coming from a Java background. Done correctly a static type system can seem like a good friend, done poorly it seems like a restriction.

Okay, I'm done with my rant for now.

Wednesday, March 23, 2011

Learning the Lambda Calculus, the Y combinator

The Y combinator is many peoples favorite combinator. Given some term in the LC the Y combinator gives us the fixed point of that term (as will any other fixpoint combinator- Y is just a well known one). This is pretty spectacular- given some function, f, we want a function that will tell us x such that f(x) = x. Notice that some functions have many fixpoints, but the fact that *all* functions (lambda terms) have at least one is very surprising. Many functions in math have such a value- abs (the absolute value function) has every non-negative number as a fixpoint as abs(n) = n for all n >= 0. Certainly there is no fixpoint function in general, but in the untyped LC there is and Y in one of them. Some tutorials will try to explain Y in a programming language, but I actually think it is clearer as a lambda term. The reason I want to go over this a little is that it gives us some of the original motivation for type systems as a way of forbidding such terms, which is were I want to end up.

So what does this term look like? Like other combinators it has no free variables, but it is definitely more complex than the terms we have seen so far. The term looks like (\f.(\x.f(xx))(\x.f(xx))). Notice the two inner terms are the same, and f is the function we are asking for the fixpoint of. I will write out the term entirely sometimes and as Y sometimes. If this really gives us a fix point, that means that if we chose some term, f, and write out Yf (Y applied to f) we should get some result, which I will just call (Yf) meaning the result of this application, such that f(Yf) = Yf, which is just the condition we need for something to be a fix point.

I really think the best way to see this is to try it out, so lets dive in. Call the input function g (just to distinguish it from the f in the way I wrote Y) we want to reduce (\f.(\x.f(xx))(\x.f(xx)))g. Well, first off lets do a beta reduction (\f.(\x.f(xx))(\x.f(xx)))g =beta (\x.g(xx))(\x.g(xx)). That wasn't so hard! Now it looks like another beta reduction is in order: (\x.g(xx))(\x.g(xx)) =beta g((\x.g(xx))(\x.g(xx))). This is an interesting term- notice that the right side (the thing g is applied to) is ((\x.g(xx))(\x.g(xx))), which is what we got after the first beta reduction. This means that Yf = ((\x.g(xx))(\x.g(xx))) as that is what we got after applying g to Y, so lets substitute Yg into our current form: g((\x.g(xx))(\x.g(xx))) = g(Yg). Wow- it looks like we have taken Yg and reduced it to g(Yg)! And even better, remember the condition for (Yg) to be a fix point is that g(Yg) = Yg. Well, equality is reflexive so we have just shown that indeed Yg = g(Yg).

If you have ever tried to take the cosine of some fixed starting number many times you found that if you kept applying cosine to the result of the previous application of cosine it gets closer and closer to a fixed number, and eventually cosine(x) = x and no change occurs. Similarly notice that not only does Yg = g(Yg) but g(Yg) = g(g(Yg)) = g(g(g(Yg))). See the pattern? The fix point of any function g is g applied to itself an infinite number of times! While with cosine we started somewhere (some fixed starting number) with the Y combinator it is "g"s all the way down.

With such an amazing little function, why would anyone want to forbid it? It turns out that while it is useful, encodes the idea of recursion very nicely and has other nice properties we won't get to, it has one problem. The definition diverges! This brings us to a fundamental aspect of computing- sometimes computations "blow up" or take forever to complete, and there is no way to determine in general exactly when they will do so. The Y combinator is amazing because it is a great example of how a useful thing for computing (recursion) can go wrong.

So what if we wanted to disallow any term that could blow up in this way? We would have to give up some useful forms of computation as there is no system that is both universal (can compute all computable things) and also never blows up. Getting rid these undesirable situations we must also forbid useful ones. This is where type systems come in, and where we can get into some more meaty subjects.

This post is getting long, so I just want to point out one last thing- the type of the Y combinator is (a -> a) -> a for all a, as given any function of one argument it returns a value (the (least) fixpoint of the function). In logic this would be read as "a implies a implies a" which would mean that anything that implies itself is true. Perhaps this doesn't seem so bad- a true thing implies itself and it is true. On the other hand, a false thing implying a false thing is true and a true thing implying a false thing is false. This means that if the LC is to serve as a system of logic we must avoid this term as it is contradictory. This is the original motivation for type systems- the untyped lambda calculus as a logic leads to paradoxes and we must restrict it to avoid them. This also shows us a profoundly deep connection between logic and computing- the unbounded computation of the Y combinator is closely tied to the fact that it encodes a contradiction.

Monday, March 21, 2011

Learning the Lambda Calculus, ski

To continue our discussion of the lambda calculus, I want to discuss three particularly interesting terms: S = (\fgx.fx(gx)), K = (\xy.x), I = (\x.x). We can also write them as: S = (\f.\g.\x.fx(gx)), K = (\x.\y.x), I = (\x.x).

The reason that these terms are important is that they, all by themselves, are a system of universal computation just like the whole of the lambda calculus. This system is called SKI and it is a form of combinatorial logic, but that is actually not why I am bringing them up. What I want to show is how we are already familiar with these terms in a different guise just like the three rules of the lambda calculus are familiar from our experience with functions in math.

Each of the terms S, K, and I have a very nice property: they contain no free variables. The term (\x.y) has y as a free variable, as it is not attached to any lambda the way x is. This would be like the function f(x) = y in math- we would have to already know the value of y to use this function. If y = 3 then f(x) = 3, but if y = 232 then f(x) = 232 so the only way to use f to get an actual result is with a context. In the LC we aren't so much interested in getting actual results (yet) but it is still important to understand the difference between free and bound variables. A term with no free variables is called a combinator, so all of S, K, and I are good examples of combinators.

We haven't gone over the idea of a type yet, but it is important to think of the type of these terms so I will use them informally and ignore various details. The term I is the easiest, we write (a -> a) as its type, which should be read as: given something of type a, return that thing. This is the identify function f(x) = x, and in logic it is the proposition "p implies p" or p -> p.

K is the next easiest, and it has type a -> b -> a which should be read as a function of two arguments that returns a value of the same type as the first argument. In logic this would be "p implies q implies p," which is obviously true- if p is true then everything implies it. If we write this as f(x, y) = x it doesn't seem very useful, but it actually comes up in programming every once in a while in the curried form (\x.\y.x). If you apply an argument, say 3, then you get back the function g(y) = 3. This is called the constant function.

The very last is the most complex. S has type (a -> b -> c) -> (a -> b) -> a -> c. I found this very confusing at first and the interpretation in logic is actually very helpful. Remember modus ponens? This is one of the basic ways of forming an argument- "if p implies q, p, then q". In other words, if p implies q and we know that p is true then q must also be true because it is implied by p. It is much clearer if written out as (p -> (q -> r)) -> ((p -> q) -> (p -> r)) which would be read as "if p implies that q implies r, then if p implies q then p implies r". In math, given a function f(x, y) = z, and a function g(x) = y we can make a function h(x) = f(x, g(x)).

I was very surprised the first time I saw terms of the lambda calculus interpreted as propositions in logic, and even more surprised when I read that these terms are not just propositions- they are axioms. In other words, they are part of the basis of (propositional) logic! They are exactly the axioms of the implication fragment of intuitionist logic.

I want to move on to type systems as an independent topic continuing the discussion of the lambda calculus but from a little different perspective so that the previous material is not a prerequisite. I would also like to get to the Y combinator as part of the motivation for type systems.

Programs are Proofs

The Howard-Lambek-Curry Correspondence gives a three way isomorphism between a typed lambda calculus, some intuitionistic logic(the logic of constructionist mathematics), and a cartisian closed category. One of the implications here is that programs are proofs and what they prove is the preposition that their type corresponds to. I've been reading about all of these systems trying to get a deeper understanding of what is going on here, and I've just realized something that, while probably obvious, has really put these things together for me.

Any term in the lambda calculus can be described by 1. variables 2. applications 3. abstractions. This means that any term can be broken down into pieces that are in turn one of those three cases. The most common way to prove things in this system is by induction on the structure of a term.

In Haskell, we have the hugely useful pattern matching mechanism. Until now I thought of it only as a nice way to take apart things defined by algebraic data types and give case analysis. But of course this is exactly induction on a data structure, meaning that the function that does this matching is proving a property of the structure by showing it for the base cases and handling the recursive cases by structural induction. Not only are programs proofs, they can have the same form as a proof! Clearly the type of the function is a statement which the function is a proof of!

One of the cool things about this is that it makes it much more clear (to me) that not only are programs proofs, but proofs are programs. In constructionist math all proofs must actually construct they thing they are proving and are therefore algorithms. This is really inspiring me to read up on constructionist math, and for reference a good blog about this is:

Sunday, March 20, 2011

Learning the Lambda Calculus, cont.

In the last post we looked at the basic syntax of the LC and the three rules. Lets go back a little bit and talk about constructing terms before going forward. Once we know how to construct terms I hope to show what the three rules are really about.

The LC was originally developed to investigate and provide a foundation for math. This project was not successful (hopefully we can get to what I mean by that some other time), but the system has been recognized as being important in itself. I mention this only because the LC gets to the essence of math- math is about writing down symbols that obey rules. To capture this property the LC is really a syntactical system. This means that we are really concerned with writing and rewriting symbols- alpha conversion replaces one symbol with another, beta reduction substitutes one symbol (the name of the parameter) with a term (the argument being applied) (we will talk about eta reduction some other time). If the LC is about syntax, and its meaning is secondary, then we should investigate what are valid statements and how to construct them.

Any sequence of symbols that is valid in the LC is called a term. This means that we must have some way of defining well formed formula's or there would be no way to say what was a term and what was not. All variables by themselves are terms; for any two terms, the application of the first to the second is a term (for e1 and e2 we can write e1(e2) or e1e2); and for any term e and variable x, \x.e is a term. The set of terms has nothing in it except what is given by these rules.

Some example terms are (\p.\y.yzp)x, (\y.yz)x, (\r.rz)x, and xz, as well as the ones we saw in the last post. These four example terms have something in common- they are all the same! This is what the three rules are *really* about- they give a way to turn terms into other terms such that if we are given two terms and we can turn them into the same term using the three rules then the given terms are equal. Lets go over the example terms and see how each one can be turned into the others. The first term is (\p.\y.yzp)x which includes the term (\p.\y.yzp). Notice that this term is in the correct form for a little eta conversion (\p.\y.yzp) =eta (\y.yz). When we talk about equality of terms we will often say which rule we are applying- =eta means the left and right terms are equal after one eta conversion, with =beta and =alpha defined similarly. Now we have the second term, (\y.yz)x which is the same as (\r.rz)x except for the name of the variable y, so we have (\y.yz) =alpha (\r.rz) and therefore (\y.yz)x =alpha (\r.rz)x. The last conversion is between (\r.rz)x and xz and takes the form of a beta-reduction. (\r.rz)x =beta xz by replacing all occurrences of r with x in the term rz, giving xz. This term xz can be written rz[x/r] which is supposed to look like we are dividing out r from rz and replacing it with x.

In the next post I hope to go over some of the more interesting terms in the LC like the Y combinator (and SKI) and go briefly over normalization to show how the LC expresses non-termination. Then I will probably skip over other topics in the untyped LC to get to the stuff that I find really fun- the type systems.

Friday, March 18, 2011

Learning the Lambda Calculus

I have decided to try something out. I'm going to explain something that I think is really neat, but seems difficult and mysterious to many people. Unlike in previous posts, I plan on going slowly and not assuming a huge amount of prerequisite knowledge. Today's subject is the lambda calculus (which I may will sometimes shorten to the LC).

The lambda calculus is a very simple system with profound and far reaching things to say about computer science and math. While there are many more complicated versions, I'm going to describe the original, sometimes called the untyped lambda calculus. The whole point of the LC is to investigate functions and the application of functions to arguments, so lets start of by thinking about functions in math. While normally in math we might say f(x) = x for a function that returns its argument, in the LC functions are so common that they don't even get their own name- they are all named lambda. Since there is no lambda character I will use "\" as it is pretty close. The function f(x) = x would be written as \x.x in the LC. The function f(x, y) = x would be written simply as \xy.x and f(x,y) = x+y would be \xy.x+y. The basic form here is a lambda, a list of variables, and a body.

Parenthesis can be added for clarity, as in (\xy.x+y). Later on perhaps we will talk about what "+" means here and how we can get rid of it and do everything with functions, but for now lets informally assume that things like "+" work the way you expect them to.

As with normal functions if we have some values and a function we can apply the values as arguments to the function, for f(x) = x, f(23) we would write (\x.x)23. The 23 is the argument and instead of parenthesis we just put the function and the values next to each other. The result of (\x.x)23 is 23, as (\x.x) just returns its argument unchanged. Another example might be (\xy.x*y)2 3 (2 gets assigned to x and 3 gets assigned to y) which would result in the expression 2*3 which evaluates to 6.

Notice that the function (\xy.x), for example, has two arguments. In the LC this function is exactly the same as (\x.(\y.x)), the function that takes an argument and returns a function that takes an argument and returns the first argument. This is called currying after Haskell Curry. If we apply only one argument (\x.(\y.x))3 then we get the function (\y.3) which if a function that wants one argument, but no matter what you give it, it will return 3. In the normal notation (\y.3) is f(y) = 3.

It is pretty intuitive that f(x) = x is the same thing as f(y) = y. In the LC it is also true that (\x.x) is the same as (\y.y). Making this explicit brings us to the first rule. The first rule of the lambda calculus is you do not talk about the lambda calculus. I mean, the first rule of the lambda calculus is alpha conversion. This is the "a rose by any other name would smell as sweet" rule and it says that we can replace any variable name with any other name as long as we replace it everywhere it occurs and we don't use a name we have already used. To be clear, its a little more complex then that, but the details aren't as important as the idea.

This post is getting long, but I want to go over the one last rule. We have seen that you can rename variables, and I was sneaky and I didn't mention that when we where applying arguments to functions we where using the most important rule- beta reduction which removes a lambda (we can introduce lambda too, as in given a term like x introducing a lambda would make the term (\y.x). This is called beta abstraction). The last rule is just as simple as the first two. Eta reduction says that a term like this (\x.(\y.y)x), which takes an argument and applies it to another function, is the same as (\y.y). This is like f(x) = x+1, g(x) = f(x). The function g is completely superfluous and we can just use f instead. We can also take a function and wrap it up like so- (\x.x+1) becomes (\y.(\x.x+1)y).

So! Thats all I'm going to say for now. We have introduced the notation and the three rules alpha, beta and eta. There is a great deal more to say just to introduce the LC but hopefully it is not so mysterious now, especially as it is so much like normal math (which is no coincidence of course). In case you are wondering, those really are the only rules. Really. The only objects are variables and lambda terms (terms are just valid sequences of symbols). Modulo some details you now understand one of the fundamental pieces of computer science, as the LC is a system capable of expressing all computable things- it is a system of universal computation. And that is pretty neat.

Monday, March 14, 2011

Eden- Distributed Computing with Haskell

I'm doing a project for a Distributed Operation Systems class on the programming language Eden. I wanted to post a little bit about my understanding of the system, so it will be a lot of rambling thoughts many of which may be false.
Eden is an extension of Haskell built of off parallel, concurrent, and distributed Haskell extensions and using part of the GdH (distributed Haskell) runtime system GUM. It adds a small number of primitives to Haskell yet allows arbitrary topologies for a distributed system.
The language has a published operational semantics based on the STGM (spineless tag-less graph reduction machine) and denotational semantics (based on a simply typed lambda calculus).
The core concepts are- processes, threads, channels, processing elements, and a non-deterministic merge. Processes are first order abstractions and are instantiated partially eagerly. They also create their output eagerly in the sense that they do not wait for the creator process to ask for output before sending it. This is necessary to avoid what is called sequential parallelism where a potentially parallel process ends up executing sequentially over many processors.
The dynamic channel construct allows communication channels to be represented as Algebraic Data Types and sent over existing channels. A dynamic channel can only be used once and creates a channel from the process that creates the channel to the process that created the channel object (using object in the more general sense of a manipulatable item).
The merge primitive is interesting as it allows Eden programs to be reactive. It is amazing how much this primitive seems to add to the language. The dynamic channel construct technically adds no more expressiveness to the language in the sense that each dynamic channel could be removed and turned into a series of transfers over static channels. The merge on the other hand allows a series of streams (events, data from a sensor, user actions, etc) to be merged into a single stream with no assurance on ordering. If for application specific reasons one needed a deterministic merge it would have to be built on top of the merge primitive.
The merging of streams means that Eden can be used to create any of the normal sort of distributed system. The underlying run-time system has concepts of remote references (they call it remote data) and location specific resources (sticky objects) that could be files or processes MVars, or anything else.
It is hard to get a good picture of the Eden system as a whole, as many of the papers are at a high level of abstraction and leave details to the underlying RTS. Unfortunately there isn't much about the details of the RTS, which although apparently similar to GUM does not make use of all of its concepts such as the global heap. I have a lot of questions that I can't seem to find answers for about the little details of Eden.
On thing that we have been talking about in my class is transactions, and Haskell certainly supports STM. I'm not sure exactly how this interacts with Eden.
The virtual machine that runs the compiled Eden language PEARL (Parallel Eden Abstraction Reduction Language) is called DREAM (DistRibuted Eden Abstract Machine). Instances of this machine must be on every device in the system. A great deal of the machinery here is taken care of in a way that is abstraction so I don't know how instances add themselves to the system or negotiate or send data or marshal and un-marshal or anything like that.

There is quite a lot more to this than I've mentioned. There is failure detection and correction, node failure and network partitioning, inter-process and inter-thread communication, support for algorithmic skeleton's, and more.

Saturday, March 12, 2011

Project Euler 12

I coach the ACM programming contest team for CNU, and last meeting we could not get on the Online Judge site. To give us a problem to solve I thought we would try out problem 12 on Project Euler.
The problem is to find the first triangle number with more than 500 divisors.
The nth triangle number is the sum from 1 to n, or n*(n+1)/2. To get from the nth triangle number to the n+1th just add n+1 to the nth number.

We were successful with a brute force approach in Java in about 24 lines of code. For comparison here is my Haskell solution.

answer = head $ dropWhile small [tri n | n <- [1..]] where
divisors n = succ $ sum $ [if n `mod` i == 0 then 1 else 0 | i <- [1..n `div` 2]]
small n = divisors n <= 500
tri n = ((n*n) + n) `div` 2