Thursday, October 14, 2010

Data Structures as Free Algebraic Structures

I've known for a while that the Kleene star over some alphabet is the same as the free monoid. I've just discovered that other data structures are described by free structures over other algebraic structures. This is mind-blowing to me, so I'm going to record here what I've learned.

Lists are things written one after the next. The order is important and we can have repetition. The only thing we can remove are "empty" things (the empty string), which is exactly like saying that 2 = 0 + 0 + .. + 2 + 0 + 0 + ..., adding and removing identity elements is trivial, and we assume all are removed (the word is in normal form). Since the monoid here is free, we don't know how to do the operator, just that there is one. This is why lists are free monoid, we are multiplying elements by putting them next to each other, but since we can't reduce them we end up with just a list of elements.

If the monoid is commutative its free versions gives us the bag structures- repetition but no order. This like a list where all orderings are considered the same list. Commutativity allows us to move elements around as we please, and since we can't combine any elements we must keep them all.

If the monoid is commutative and idempotent we get sets as the free structure. Again the ability to move elements around and get an equivalent word gets us the lack of structure of sets. Commutativity says that we can group equal elements by moving them around in the word, while idempotency says that multiple element that are the same can be canceled.

If the structure is not associative, a magma, then we get binary trees. This is because we must specify the order of operations, which is like the s-expression representing a tree where all lists have two elements.

So cool! I really wonder if other free structures give us any other data structures that are well known or interesting. I will have to keep studying abstract algebra and investigate this. This even gives me a way to think about the repair operator I want to introduce to PGEP, where there is an equivalence relation identifying words over the symbol alphabet that encode the same tree. This relation only applies during evaluation, not during evolution (where the non-coding region is important).

1 comment:

  1. It seems like adding idempotency to the free monoid is like calling nub on a list. We make it idempotent and then make it just a monoid again, but now all the groups of the same elements are reduced to single elements.

    ReplyDelete