Tuesday, November 16, 2010

An Algebra of Datatypes

In this post I'm going to give a brief description of the first couple operations in the theory of Combinatorial Species, without the category theory behind it. I hope to explain the notation and to show its relation to Haskell's Algebraic Datatypes.

There are several primitive types, which all others are build out of by combining primitive types with one of the operators. These types are not the normal types one thinks of (int, string, etc) but are rather the building blocks out of which structures are created. It is these structures that can be thought of as holding a concrete type (this is made explicit in species theory).

The first primitive type is 0. For all sized structures it is not a structure at all. It never holds data or does much of anything. Then we have 1. 1 has no place to hold data, so it is like (very much like!) nil or () in Lisp- it denotes an empty container. The rest of the natural numbers are defined similarly, with 2 being essentially booleans, 3 being taking three possible values but holding no data elements, etc.

The next primitive type is E, the type of sets. There is only one set of each size, as sets are unordered, and so "permuting" them makes no sense.

C is the cycles, which are like linked lists where the end node is linked to the front. They are unique up to rotations.

Then we have X. X is a hole in which to put data. Its kind of like a variable- we don't at the time we write it know what it will hold, but we know it "holds" something.

To combine these types we have many operators. I will only talk about three right now- +, *, and o. The "o" is meant to be composition, like the composition of functions. The sum operator means that the resulting structure can be either one thing or anything thing. For example, 1 + X is exactly the Maybe type in Haskell- it is either a placeholder (1) or Just an element (X). This is related to taking a disjoint union.

The * operator pairs things together. Where + is like the Haskell Either type (which is how we express + on values) * is like ",", the pairing function. Note that in Haskell when declaring datatypes we can have multiply constructors, if desired. Since they are a tagged union of types they correspond to the + operator on species. Then we have * as the fact that a type constructor can have multiple values- they are really tagged tuples. An example is 2*X*X, which we can interpret as a pair and a index into the pair, which distinguishes one value over the other.

Interestingly we can write, for any n, 1 + 1 + 1 ... + 1 n times. Similarly, X*X*X = X^3. Many such properties work out in this system. This means that what we end up with are types of the form c1*X^x1 + c2*X^x2 ... + cnX^xn, which are polynomials! This means that (as far as I know) Haskell's algebraic datatypes are polynomials in the theory of species!

For the concerned reader, note that this isn't really trues for many different reasons. For one, Haskell allows infinite types, I haven't talked about how to include least fixed points, and I'm not sure if some Haskell type extensions don't have neat analogs to the theory of species.

Also notice that we haven't talked about two of the primitive species! The reason is that cycles and sets are "non-regular" in that they can be modified in a certain way- we can take a renaming of one- without really changing it. They are not part of the Haskell datatypes, which are all regular.

I hope that this is at least a tiny bit helpful to someone trying to learn about differentiable datatypes (which are amazing!) or any of the other topics surrounding datastructures that use this notation. I hope to write some time about the other operators- composition, functorial composition, differentiation, pointing, and cartisian products, as well as maybe some of the categorical explanation of what is going on. I am also interested in some of the other parts of the theory, like extending the (U, +, *, 0, 1) commutative semi-ring with unity to a ring with unity (additive inverses are called virtual species) and adding multiplicative inverses in order to express certain things. There are even further extensions, such as a generalization to structures holding many types of data (all of these have kind *->*) called k-sorted species, as well as "weighting" species to express more complex properties of the structures we are describing.

No comments:

Post a Comment