Friday, March 25, 2011

Good Static Type Systems

What makes one static type system better than another? Why is Haskell's so very helpful and useful and Java's so painful and useless? We can approach this from a fairly rigorous standpoint by rooting the discussion in type theory and the associated logics. Warning- this post is long and ranting.

Type systems for programming languages are formalized by type theories over the lambda calculus. Types organize the otherwise untyped lambda calculus and present a rigorous setting to study type systems. A type theory corresponds to a system of intuitionistic logic where programs are proof in this logic and the proposition they prove is the type of the program. Since there is such a strong link (an isomorphism) between type theory and logic, then we would expect it to show up in the type systems of the languages we use. This is where I want to focus the discussion.

In logic there are a series of connectives we may encounter such as disjunction ("and", & , /\), conjunction ("or", |, \/), implication ("implies", ->). We may also have quantifications such as the universal quantifier (forall) and the existential quantifier (exists). Lets see how there show up in programming language and their type systems.

Lets start with implication, which corresponds to functions. Does Java have functions? Well, not really. It has something that look like functions, but without referential transparency, higher order functions, and functions as first class objects they aren't much like functions at all. Haskell has all of these properties. Implication is perhaps the most basic connective, and I think functions as first class objects are one of the most important things a programming language can have.

What about conjunction? This corresponds to pairs and, more generally, n-tuples. It could be added in some form to Java, but not having it built-in has an important consequence when we get to pattern match later on in this post. I Haskell we have the algebraic data types, whose constructor are products of types. We also have tuples with the usual notation (1, "s") for a pair of type (Integer, String).

And disjunction? This is a tagged union type. If anyone is wondering where I am getting the interpretations of the logical connectives as data structures, it is well worth looking into the Algebraic Data Types and the Theory of Combinatorial Species as well as Haskell. Sum types appear in Haskell with constructor names as tags. As with tuples, we also have a data representation of sum types- the Either type. We may have, for example, a Left 1 or a Right "s" from the Integer + String type (normally written "Either Integer String" in Haskell).

Negation seems like it would cause some problems here. What is the negation of a data type, and what would it look like? Well in the logics that type systems are related to, the concept of negation is just this: for the false value _|_, the negation of A is A -> _|_, read as "A implies false." This doesn't seem terribly useful in practice, and I only mention it because it shows a connection between a translation of statements in classical logic into intuitionistic logic by double negation with continuations.

Finally, what about quantification? Java does have generics which give a way of expressing the parametric polymorphism that the universal quantifier adds to a language. Unfortunately the most important relation Java does give us, the order relation of subclassing, does not play well with generics. This is very unfortunate. In Haskell, parametric polymorphism is ubiquitous and comes naturally with nice syntax.

Speaking of nice syntax, one obvious difference between these two type systems is that despite (actually exactly because of) being more powerful, Haskell's type system allow type inference. Java 7 is supposed to allow some amount of inference, but from what I've seen it is very, very limited.

A very common proof method in the lambda calculus and in logic is proof by induction. If we can prove something for the base cases and take apart the connectives we can prove something about terms. Programs are proof, but don't forget that proofs are programs- Haskell's pattern matching allows the programmer to express the most natural proof method in these systems by induction on the structure of data. Pattern matching is a hugely powerful thing to be able to do, and like higher order functions, I have started to miss it dearly when I use most languages.

So, Java's type system doesn't appear to support any of the usual structures found in type theory or logic, so what does it have? It has generics, and it has subclassing. Certainly such a relation is expressible by Haskell's class system, and you can even get the inclusive polymorphism that object orient programmers love so much. So what does Java's type system have going for it? Well, it could not have an ordering relation or generics, so it could be worse. The problem is that it doesn't include the most basic ways of creating structured data and must rely on using subclassing to express almost everything. Static typing done this way still prevents many types of problems, but it is very understandable for programmers to find dynamic types and partial structural types liberating coming from a Java background. Done correctly a static type system can seem like a good friend, done poorly it seems like a restriction.

Okay, I'm done with my rant for now.

No comments:

Post a Comment