Wednesday, March 30, 2011

Program Semantics

Part of learning a programming language is developing a mental model of the semantics of the language in order to reason about expressing ideas using the tools that the language makes available, and when reasoning about the operation of programs. In an imperative language a reasonable model is an of abstract machines (the meaning of a program is the operation of the abstract machine), which declarative languages get a semantics from the system they are based on (logic or a lambda calculus). The meaning of a declarative language is in the language itself, not the result of the compiler, as discussed here: http://existentialtype.wordpress.com/2011/03/16/languages-and-machines/.

This post will go over my personal abstract machine for some languages I've used- YMMV.

In C the code itself is in some untouchable "nowhere" and it dictates changes to the two parts of memory- 1)the stack, which is intimately tied to the code and directs its control flow, and 2) the heap "out there" where any amount of memory can be taken. The programmer has only indirect access to this stack, directing it but not dictating its control. There is no concept of a function or a conditional statement (or any other control structure for that matter) but rather the idea of the machine code that produces these effects. Functions are just jumps that effect the stack, and the jumps for all control does this "jump" by moving the mental cursor around in the untouchable "real code" somewhere out there.

In Java memory is the big pile of objects with connections to each other forming a huge web. Control flow moves not in one place as in C (the stack) but from object to object. This is something like a cursor, and having multiple threads is like having many cursors. I realize that this is not true, and I certainly don't reason like this while programming- I'm just investigating what I see as the language's notion of the space in which its programs live.

One could think of the JVM when imagining Java code's runtime- and I often do this- but, for example, for a beginner programmer with no grounding in the strange mixture of language semantics, virtual machine semantics, machine code semantics, and physical machine semantics that a Java programmer must contend with, the language may seem like a huge open space of objects floating around with directions to other objects.

In Forth there is the built-in dictionary, extending off into the user dictionary. The program's execution is so direct there is really no abstraction- the only thing that floats off in the untouchable "somewhere" is the actual hardware of the registers and the ALU, etc. Control flow occurs in a read and writable part of memory (the rs) and the main part of the program is concerned with the change of the special, separate ps. The stacks and the scratch pad memory are conceptually separate from the dictionary.

Python and Ruby seem more like Haskell to me in the sense that I don't think of the machine model as much. I think this is partially because of the use of higher order functions- it lifts me from the machine concept to a more mathematical one. On the other hand, the objects are similarly connected by a web of references, and the control flow does involve a movement through this web.

It took me some time to begin to understand my mental model of Haskell code, but it seems like I see it as being a series of rewrites doing beta-reductions. This is interesting, because this is a pretty close model to what "really" happens in the sense that the core code is an extension of the polymorphic lambda calculus. Sometimes I think about the STGM (Spineless-Tagless Graph-reduction Machine) when thinking about data structures, but for the most part I think of Haskell in a very high level mathematical way.

So, there you go- some rambling thoughts about a programmers mental model of languages. Read the link in the beginning of the post for a more interesting discussion of this topic.

No comments:

Post a Comment