Monday, November 21, 2011

Structure of C Types

It seems like it is less common in imperative languages to investigate the structure of a language type system and try to understand its power. I think it is important when learning a language to know how it expresses things, and the limits of its ability to express things. In a previous post, I talked a little about some basic structure of Java's type system, so in this post I want to record an observation about C's.

There are several base types in C- int, char, double, as well as different variations like unsigned and const. None of these base types are subtypes of each other. The type void is the bottom type- there are no objects that actually have that type.

Structs are products of types. Products of types are like Cartesian products of sets. A simple example would be the struct defined like so:
struct Point {
int x;
int y;
};
Which gives the type of pairs of ints.

Unions of types are unions of sets. These aren't really all that useful- we need some way to determine which of the types in the union a particular element of the union is. Without this information an element of a Union is pretty useless. This is because we can only safely use the properties held by all types of a union, and since the type system is pretty flat there aren't many properties we can use for any given union of types. While structs are products, unions are sums (although to be proper sums they would need to be disjoint unions).

Enums are each the same as some natural number (the number of elements in the enumeration) which gives us sums, products, and natural numbers in the type system (with void as 0 in the sense that there are zero elements to the type). Notice that the number of possible elements of a struct of two enums (using only elements of the enum as C won't stop us from doing what we want) for enums of size n and m, is n*m. The number of elements in the union is n+m assuming they are disjoint. If they are not then the number of elements is like the union in set theory- we "remove" duplicates. In true disjoint unions elements that are otherwise the same are tagged with something to make them different.

C's only unary type constructor is the pointer/array. In some sense there are an infinite number of array constructors, as we can provide a size, although since this isn't enforced at all I'm going to ignore it. For any type we can make a pointer to it, and a pointer to that, and so on. Therefore the set of types is closed under pointing. Very often when one doesn't know the type of something we use a pointer to void. This is no coincidence- void pointers are the top type- the type that contains all other types (well, all other pointer types).

I think it is pretty interesting that void is 0 and void* is 1 in this system. Of course, with arbitrary casting void* are can actually be useful, as elements properties of the top type are only those properties of all other types, which is basically nothing. Otherwise a struct with a void* like:
struct Nat {
int n;
void* v;
};

Would be the same as:
stuct Nat {
int n;
};

Because we couldn't do anything with the variable v, which, translated to type theory, is like saying that N*1 = N. Similarly:
union Nat {
int n;
void v;
};

If this were a valid type then it would be N+0 which is equal to N as the void contributes nothing.

I think thats enough for now. I didn't get to function types, but maybe there will be a follow up post on this.

No comments:

Post a Comment