Its Computer Science Time: February 2013

Wednesday, February 20, 2013

Idris is Pretty Cool

I've been reading recently about a programming language called Idris. Its similar to Haskell, but with dependent types. It is intended to be a practical programming language rather then a theorem proving language or a proof of concept, and there are papers about applying it to systems programming. That is one of my interests in dependent types- increasing the safety of complex programs that involve things like communicating with hardware. I use C to do this professionally, and I find that it is difficult to get this kind of programming correct every time with all the complicating factors. There are often times that I could express more about the structure of data or the flow of the program with a better type system. Dependent types have a lot of potential, and I would love to see them make their way into the kind of work that I do.

I may have to post about dependent types in the future, as they are probably the biggest thing that I know about in the functional programming community right now. They represent a real change in the use of programming languages and how one thinks about programming, and I for one am in favor of this change.

Arduino LED Language, part 2

The LED language for the Arduino is still changing quite a lot. I've gotten rid of the whole cooperative multitasking thing in favor of a simpler interpreter that is now just a switch threaded Forth. As it gets more advanced I'm faced with some problems- whenever one designs a domain specific language there is the problem of how much power to give it. Right now it does not support variables (constants are supported but not built in) or any sort of parsing words or anything interesting like that. I'm running into the problem that Forths have on the Arduino, which is the problem of having your program stored in flash and running it out of RAM. I will probably just add some simple hacky variable system to keep things from getting out of hand complexity-wise (indices into an array that get expanded to accesses at compile time or something).

In terms of progress it has conditionals and loops, and supports execution tokens (the Forth term for this idea) aka function pointers aka almost higher order functions (with no combinators to construct new ones at runtime). Unfortunately it does not support getting the execution token of built in words, which is a pretty arbitrary distinction.

Incidentally, I have it working with a full 5 RGB leds, controlling all 15 pins nicely. More things need to be added to the language to get complex animations, but as it is one can fade leds, set colors, and having sequences of blinking lights.

Thats all the updates I can think of for now.

Thursday, February 14, 2013

LED Arduino Language

A friend of mine has come up with a project involving RGB LEDs where it would be cool to be able to "script" their color, possibly animating them over a longish period of time, with multiple possible animations. To do this, I've come up with a system that is essentially 15 cooperative multitasking threads interpreting a stack-based language that I've also embedded into Haskell for some meta-programming goodness. Together, these threads control the intensity of each color of each led (we plan to have 5 of them, 3 colors * 5 leds = 15 pins to control (intensity is controlled by pulse-width modulation controlled through an interrupt attached to a timer)). This post merely records the fact that this idea actually does work, and I can control the color of the first RGB LED attached to my Arduino on my desk here.

Pretty neat, though there is much work to do. I need to make sure the idea scales, possibly optimize a bit, get the embedding into Haskell nicer, make some more complex animations, add more to the language itself, and possibly write a parser and make it a standalone language to make the scripts look nicer. I would also like a mode that allows the user to control the colors, so I need to figure out how to do some input 9the programs are read from flash with no input from the user at the moment).

Tuesday, February 12, 2013

Abstraction, Information Complexity, and Diversity of Programming Languages

I've been using some very underpowered languages at work, one of which is domain specific and doesn't have such niceties as user defined types or real functions/procedures/subroutines. I end up writing very repetitive and error-prone code and I can't think of any way to factor out commonality or provide abstractions to make my life easier. I'm also working on a project with the arduino involving controlling a lot of constantly changing values (colors of RGB LEDs) where I've worked out a simple language so that each pin of the arduino is controlled by one thread of a small cooperative multitasking system with program written in this little domain-specific language (one which is becoming more and more like Forth). I came up with the idea of expressing the changes in LED brightness (which controls color on RGB LEDS) in the form of a language when I realized that I needed a compact representation of the sequence of values needed for control. The simple language has compressed the size of the data necessary to store complex animations by around a factor of 100 (at least, possibly much more).

What all this lack of abstraction in some language and the use of a domain specific language to compress data has made me realize is that code that is not abstracted contains a lot of redundancy, and that the less repetition in a program the more information is carried in each character. This may seem obvious, but the funny conclusion is that a very well factor and abstracted program is close to random in a certain information theoretic sense (Kolmogorov complexity of generating the program's text). This is mostly just amusing and certainly simplifying the situation, it does explain why my LED language compresses the data so much- I got to choose the machine/language used to express the information, so I could construct one that happens to have a concise representation of the particular type of data I needed.

It also explains (to some extent) why highly abstract programs can have a higher mental overhead to understand- there is more information contained in each line, even if the full amount of information in the program is equal to an equivalent but poorly written program (or one in a simple language). This suggests that there is an advantage to choosing abstractions that have some underlying consistent concept or model when possible rather then just abstracting the observed common case in a program. The ability to understand an abstracted program depends on the ability to understand the abstractions, so well chosen abstractions that are grounded in theory and well understood can lead to programs with a firm foundation where code can be expressed concisely according to an understandable mental framework rather then one that is built adhoc. There is a danger when making things too consistent (the whole "everything is a _" style) when a problem doesn't fit that framework well and has to be awkwardly encoded in it, but in general there should be problem that any language has trouble encoding. This suggests that a languages ability to construct DSLs is important in its ability to express appropriate abstractions and keep the information content of programs high. Since this language is simple, it is not easy to program in. To solve this problem, I embedded it into Haskell and wrote a tiny code generator, giving me all sorts of meta-programming abilities. The domain specific language encodes its domain concisely, but lacks power, and the language its embedded in lacks the ability to express the domain directly, but together they are a very nice duo.

My conclusion in all of this ranting is that a diversity of languages is a good thing as different languages express different domains more closely and are able to encode its concepts. There is tension here when a languages focus is too narrow and it can't express anything outside of one domain or one way of looking at its domain. Languages that can easily have other languages embedded into them have an advantage in usability as general purpose languages as they can cover many problem domains and reap benefit of encoding expressions in languages specific to those domains. They can encode domains in multiple ways and can be adapted to many situations. They may not have all the benefits of a more specialized language in some cases, though they will have the benefit of transferring whatever power they have to the domains to which they are applied (which may otherwise get less powerful but fully domain specific languages applied to describe them). For Haskell, there are many ways to embed languages, and certainly languages like Lisp (or Forth, incidentally) excel at it. The functional programming community in general is comfortable designing and implementing languages, and I think that may be one of its biggest strengths. I personally prefer languages steeped in type theory, and I recognize that my language of choice is restrictive compared to most languages (and we like it that way, gosh darn it!), but I wonder if there are language with built in faculties for constructing domain specific languages that are based on formal language theory. I'm talking about specific, built in abilities to define and manipulate things like ASTs (sexprs fulfill this), grammars, operational semantics, or something like that. I feel like either this exists, or is just more easily done when encoding in the constructs of an existing language like Lisp, but I can't help but wonder.

Alright- enough rambling for now.

Wednesday, February 6, 2013

Logic = Syntax + Semantics

I've been reading the Model Theory notes from: http://www.cs.man.ac.uk/~hsimmons/BOOKS/books.html. The very first sentence of the lecture notes that (much of) logic is about understanding a subject through the use of "syntactic gadgets". This made a whole series of things click for me- the kind of thing where I feel like I almost knew this beforehand, but never quite made the little leap until now. I like these moments, so I will now record what I've learned. Warning- I'm not a logician or mathematician, take what I say with a grain of salt.

When we use logic to study a subject, the logic does not necessarily cover the entire semantic domain. Often we can only carve out some subsection of our domain and reason about it through a logic, especially if that logic is to be simple. This is not necessarily a bad thing- carving out total subsets of untyped lambda calculus by type theories (slightly different from whats in the lectures, but corresponding ideas) is a common pass-time in computer science. Being able to talk about a subject purely syntactically has several advantages to studying it directly. It is a common language which can be studied in isolation. This lets us study properties such as computability of, say, first order theories and apply the results to a large class of situations. It also allows a subject to be transformed into the unambiguous language of symbols. If that language is intended to describe a real-world phenomenon, like situations where the truth of something changes over time or where truths are added as proofs are discovered, then encoding it in a logic and finding a semantics for the logic gives the encoding of the real-world situation in math. This is what gives logic that truly foundational feel to me- over systems can be taken as a foundation, but logic has to do with symbols, and any other system will be manipulated and reasoned about using symbols written on chalkboards (or other surfaces, I suppose). This also applies to computer science, where I think of type theory as being "more foundational" (subjectively) then other systems.

This makes a lot of things clearer to me. Logicians like systems that are sound and complete with respect to their models because then the logic can be used to reason about the semantics without problems in the logic or in the semantic domain. It also makes more sense why Kripke models are so important- they give a mathematical setting for the subject of interesting logics.

Thinking about this in the context of constructive logic lead me to what I think is a better understanding of why domain theory is set up the way it is. I may be completely wrong about this, but what I'm thinking is that in constructive logic there are more then two truth values. It is not that there are three- which would be a tristate logic- but more like there are a countably infinite number of truth values. The truth value "true" is at the top, "false" is at the bottom, and all the rest are, in some sense, between them. The sense in which there is something between true and false is that a value may be partially defined. If a data structure can be evaluated part of the way to normal form it is more defined then one that has not been evaluated. It may not be fully evaluatable, but there is a real sense in which it is "more evaluatable" then a structure that can be evaluated less then it. This means that there is an ordering (a partial order) of structures by how "true" they are, or how much they can be evaluated before becoming completely evaluated or being equal to bottom. There is probably a mistake in my understanding here, as this ordering is over the terms, not the types, but the types are the equivalent of logical terms, but I don't know enough to be sure where my mistake it.

This also firms up my understanding of the relationship between category theory and logic. I've seen before how categories have internal logics, but when talking about things like translating syntactic objects to semantic ones, it clearer to me now how one might define a syntactic category and then one or more functors to a semantic category. I will have to go back to reading "Categories for Types" to see if I understand more now.

Again, don't take this post too seriously- I'm still learning and I have no formal background in this subject. My interest in it has carried me pretty far in understanding, but there are lots of details I get wrong still. Interesting stuff, nevertheless.