Tuesday, February 12, 2013

Abstraction, Information Complexity, and Diversity of Programming Languages

I've been using some very underpowered languages at work, one of which is domain specific and doesn't have such niceties as user defined types or real functions/procedures/subroutines. I end up writing very repetitive and error-prone code and I can't think of any way to factor out commonality or provide abstractions to make my life easier. I'm also working on a project with the arduino involving controlling a lot of constantly changing values (colors of RGB LEDs) where I've worked out a simple language so that each pin of the arduino is controlled by one thread of a small cooperative multitasking system with program written in this little domain-specific language (one which is becoming more and more like Forth). I came up with the idea of expressing the changes in LED brightness (which controls color on RGB LEDS) in the form of a language when I realized that I needed a compact representation of the sequence of values needed for control. The simple language has compressed the size of the data necessary to store complex animations by around a factor of 100 (at least, possibly much more).

What all this lack of abstraction in some language and the use of a domain specific language to compress data has made me realize is that code that is not abstracted contains a lot of redundancy, and that the less repetition in a program the more information is carried in each character. This may seem obvious, but the funny conclusion is that a very well factor and abstracted program is close to random in a certain information theoretic sense (Kolmogorov complexity of generating the program's text). This is mostly just amusing and certainly simplifying the situation, it does explain why my LED language compresses the data so much- I got to choose the machine/language used to express the information, so I could construct one that happens to have a concise representation of the particular type of data I needed.

It also explains (to some extent) why highly abstract programs can have a higher mental overhead to understand- there is more information contained in each line, even if the full amount of information in the program is equal to an equivalent but poorly written program (or one in a simple language). This suggests that there is an advantage to choosing abstractions that have some underlying consistent concept or model when possible rather then just abstracting the observed common case in a program. The ability to understand an abstracted program depends on the ability to understand the abstractions, so well chosen abstractions that are grounded in theory and well understood can lead to programs with a firm foundation where code can be expressed concisely according to an understandable mental framework rather then one that is built adhoc. There is a danger when making things too consistent (the whole "everything is a _" style) when a problem doesn't fit that framework well and has to be awkwardly encoded in it, but in general there should be problem that any language has trouble encoding. This suggests that a languages ability to construct DSLs is important in its ability to express appropriate abstractions and keep the information content of programs high. Since this language is simple, it is not easy to program in. To solve this problem, I embedded it into Haskell and wrote a tiny code generator, giving me all sorts of meta-programming abilities. The domain specific language encodes its domain concisely, but lacks power, and the language its embedded in lacks the ability to express the domain directly, but together they are a very nice duo.

My conclusion in all of this ranting is that a diversity of languages is a good thing as different languages express different domains more closely and are able to encode its concepts. There is tension here when a languages focus is too narrow and it can't express anything outside of one domain or one way of looking at its domain. Languages that can easily have other languages embedded into them have an advantage in usability as general purpose languages as they can cover many problem domains and reap benefit of encoding expressions in languages specific to those domains. They can encode domains in multiple ways and can be adapted to many situations. They may not have all the benefits of a more specialized language in some cases, though they will have the benefit of transferring whatever power they have to the domains to which they are applied (which may otherwise get less powerful but fully domain specific languages applied to describe them). For Haskell, there are many ways to embed languages, and certainly languages like Lisp (or Forth, incidentally) excel at it. The functional programming community in general is comfortable designing and implementing languages, and I think that may be one of its biggest strengths. I personally prefer languages steeped in type theory, and I recognize that my language of choice is restrictive compared to most languages (and we like it that way, gosh darn it!), but I wonder if there are language with built in faculties for constructing domain specific languages that are based on formal language theory. I'm talking about specific, built in abilities to define and manipulate things like ASTs (sexprs fulfill this), grammars, operational semantics, or something like that. I feel like either this exists, or is just more easily done when encoding in the constructs of an existing language like Lisp, but I can't help but wonder.

Alright- enough rambling for now.

No comments:

Post a Comment