Wednesday, September 7, 2011

Forth Threading Models

In the last post I gave one possible motivation for threading. Now I would like to go over a couple of techniques that can be used to achieve this affect. For more detail, see "Moving Forth" at http://www.bradrodriguez.com/papers/index.html, which is an amazing reference.

The first model is Direct Threaded Code (DTC). In this model, each subroutine (word) contains machine code. If the word is a primitive, this code is just the implementation as an assembly subroutine. If this is a high level word, and therefore only contains calls to other words, then the machine code is a push of the next instruction and a jmp to a subroutine that will start calling each address, one after the next. This would look like:

call DOCOL; a; b; c; NEXT;

Where DOCOL is the subroutine that calls each of a, b, and c (addresses of subroutines) in turn. NEXT is the address of the instruction that keeps this whole process moving- if we have entered a word from another word it will leave the current word and move one step outwards in the nesting. I will be explaining the various stack in a later post, so just know that NEXT handles cleaning up at the end of a word.

Another model is Indirect Threaded Code (ITC). In this model, a word must always contain a list of addresses of other words. This means that even the most primitive words must contain a pointer. Such a word might like (assuming the word starts at address 0x8000 with 32 bit cell sizes):

0x8004; add r1, r0, r1; call NEXT

This word simply jumps to the next instruction, as its require to jump somewhere. The actual action is to add two registers, and then call NEXT.

Another technique is Subroutine Threaded Code (STC). This one is a little bit ironic, as it "threads" the address by simply making each one a call. The disadvantage is that it may take up more space then other techniques as each address is actually the call instruction. Remember, however, that the original motivation here was to remove these redundant calls. While at one time it made sense to save that little bit of memory, now it may make more sense to increase speed by removing this extra level of indirection and just make all jumps subroutines calls. Another advantage here is that primitives can just be machine code- we always jump to a word like DTC. The word NEXT may literally be just "ret" or return as we are just doing the usual jumping to an address while storing the instruction pointer call we can do the usual return to the last subroutine by popping the return stack to the instruction pointer.

Yet another technique is token threading. In this technique a word is an array of values that aren't themselves addresses to other words, but are rather indices into another array of addresses of words. The only real advantage to this technique is that it can be used to get very very small compiled words- a table of only 256 words can fit each "token" (index) in a single byte. The problem is we have yet another level of indirection to wade through, and the number of words is limited to the size of the table. I don't like this technique very much as it also makes the structure of the dictionary particularly fixed (we will get to the dictionary later) and that makes some interesting things much harder.

While there are more forms of threading, I'm only going to discuss one more- switch threading. This technique is basically what a lot of bytecode interpreters do- they have a massive switch statement that determines what do with a compiled bytecode.

Well, thats it for now. I believe I will be doing a ITC interpreter for my new Forth, simply because it is easy to place pointers to functions in C. I'm considering playing around with the C stack a bit, but for now I'm looking at each word as an array of pointers to void functions with no arguments.

No comments:

Post a Comment