Its Computer Science Time

Friday, July 4, 2014

HAMT- A Cool Data Structure

Hash Array Mapped Tries are a cool data structure. They are complex, but have some great characteristics.

You can find the paper "Ideal Hash Trees" from the wikipedia article.
http://en.wikipedia.org/wiki/Hash_array_mapped_trie

A Neat Blog

Just a blog I've been enjoying recently: http://jvns.ca/

Thursday, May 8, 2014

Switches and Buttons and Dials and A Haskell DSL for Embedded Systems Programming

In my ongoing journey towards hobbyist electronics, I've connected a button, a switch, and a dial (potentiometer) to my Arduino. It is simple stuff, but I still did it wrong the first time.

The code is done using the Haskell DSL Ivory, which is pretty easy to get set up with the Arduino. At some point I would also like to get the other language build by the same team called Tower working on the Arduino, which requires FreeRTOS, but that will be a project in itself.
I'm using the makefile from http://www.mjoldfield.com/atelier/2009/02/arduino-cli.html and a little shell script to avoid the Arduino IDE and so I can build everything from the Template Haskell to the Haskell to the C to the final binary and upload it to the Arduino all at once.
No external C code is required when using Ivory (though I have a #define in a header file to configure the generated C code). I even wrapped up some of the Arduino Serial library in C functions to be able to call it from Ivory. The wrapping is necessary to call C++ functions from C, and in this case, to import them as external functions in Ivory.

Ivory itself is a language for embedded systems as a Haskell DSL that does much more type checking and prevents many common bugs compared to C. I've liked it so far, although it certainly takes getting used to. I like that they make use of some Haskell extensions like automatically lifting data values into types and types into kinds. The bitdata aspect of the language looks really great. I am interested in trying to specify things like register contents using their bitdata types to see how much better we can do then C (which I use for this kind of thing at work).

Back to the Arduino, I was using analogRead to get the voltage level, sending the input to the host computer using the wrapped up Serial functions, and then using "cat /dev/ttyACM0" to read from the Arduino, I could see the voltage level in the little circuits for each element. The most fun was the dial, as you could make very fine adjustments to the voltage by just barely turning the dial, and you could go through the range (0-1023) in a nice linear fashion.

I'm looking into taking the EdX class taught by (among others) Gerald Sussman to get some basic electronics knowledge. I have some highschool education in this, and some physics classes in college, but not enough to do anything interesting. Hopefully next up will be a hall effect sensor or a 1-wire temperature sensor.

Saturday, April 19, 2014

QuickCheck is Amazing

QuickCheck, the Haskell testing library, is a pretty amazing thing. The idea is that you don't specify test cases when testing your code, you specify what properties your code has. The QuickCheck library (which I'm using through the Tasty testing framework) will generate random data to check that your specified properties hold. If a test fails, it will try to reduce the generated input to a simple form and present it to you as a counterexample of the property.

There is a lot to say about this, so I'll just go over some things I liked:

1) Your test cases become a specification. Each test case declares a property of your code, and all together they give a description of what your intentions where.

2) Splitting pure and impure code is good practice, and QuickCheck certainly encourages this. You can test impure code with it too, but pure code is easier to test, its easier to generate repeatable cases for, and the examples should be easier to shrink. I've seen a lot of Genetic Algorithm libraries mix the generation of random data and the application of that data to modify a population. There is no need to mix these two concerns, and splitting them out has definitely helped me with writing these tests.

3) You have to put some thought into what you meant when you wrote your code. I had a complex function at the heart of a Genetic Algorithm library I've been writing, and when I started writing test cases, I realized that it was not easy for me to say what properties it had. This indicates to me that I need to break it into smaller pieces that can be understood and tested separately. This means that QuickCheck didn't just give me counter examples, it is helping me improve the overall structure of my code!

4) Once you get used to creating Arbitrary instances and using them to come up with test cases, it is pretty easy to specify a lot of properties. This requires a different way of thinking about testing, but it seems very effective and quite manageable.

5) Testing a complex, efficient algorithm against a simple specification can give you confidence in your code. Being able to generate hundreds of random cases and comparing the results seems like the perfect test.
I'm using an implementation of Point Mutation that is much more efficient then the usual implementation seen in nearly all Genetic Algorithm libraries, and I'm going to be very interested to test its correctness against the naive definition. I would bet there is some way to even take performance data and cause a test to fail if the "efficient" implementation is slower (within a certain margin) then the simple one. This would be one nice way to regression test your efficiency concerns.

6) For random algorithms, there are stories of incorrect implementations giving bad research data. There are a lot of properties of a Genetic Algorithm that could be specified in this way to ensure that the implementation does contain certain types of mistakes.
For example- mutation increases diversity (usually), selection reduces diversity, elitism maintains the best k individuals, a Genetic Algorithm out-performs another algorithm (say, a hill-climber) on a particular problem, a mutation rate of 0.5 (50%) on a bitdata individual mutates approximately 50% of its bits, etc. These tests can fail a certain number of times, but must hold if enough random tests are performed.

I can see why the Haskell community loves this technique so much. I've not looked into its cousins (SmallCheck and SmartCheck) but I fully expect that they are just are effective.
I recommend this technique very much. There are ports of this concept to quite a few languages, so there are no excuses.

Using Forth/Copilot/FreeRTOS With the Arduino

After over a year of absence, I'm back to blogging. This time its about the Arduino and my continued quest to program it with something other then C. I've written multiple Forth-like languages for it, but I am yet to actually reprogram it with something like amforth. Instead, I've been investigating other languages one can use.

The program I've been playing with is to get the Arduino to control RGB LEDs where each color's brightness is controlled with PWM. The setup only requires a handful of pins since the outputs are through shift registers. The point of this is to create cool light shows. My first attempt worked pretty well, but that was many months ago. I decided to try start over with the Forth approach, alongside a Haskell DSL to compile an array of opcodes that are stored in flash and run by a C program.

Forth:
The C program can allocate "fibers" (cooperative threads) so each LED is handled by a separate task. This makes programming in this language pretty easy, and you can control different outputs this way (I also had a 7 segment display working with this scheme at one point).
This whole scheme was fine, but it has a definite limit- on the Arduino you can't program flash (without changing the bootloader). This makes writing a Forth interpreter difficult. You can use RAM if you use a case-threaded interpreter, but you can't save words across resets unless you want to use the tiny 512 bytes of EEPROM. Using flash is okay if you are willing to accept a static program, but anything slightly dynamic (like allocating variables) would be awkward and clunky. Despite these limitation, I've been pretty happy with the simple light shows that I've been able to create. I'm definitely a much better embedded systems programmer, and I can see the difference in ability and understanding that my professional life has brought to these hobby projects.

Copilot:
My next attempt was with to use Copilot, a Haskell DSL for stream programming that can generate C code. You can do several other things with your programs, like verify their properties or interpret them, but I didn't look into those aspects of the Copilot project. The point is really about formal methods and software correctness, which is interesting stuff.
Copilot programs specify a stream of values and tie them to your hardware with a little C, and the Copilot compiler takes care of generating most of the code and generating a schedule that updating everything at the required rates.
This is pretty cool, and I managed to put together a program that does in fact blink LEDs. I eventually got PWM working, although there is something visibly jittery about it. I found that the schedule was meeting the 1ms timing, however, so it is probably a problem with my own code, not the efficiency of the generated code.
Some cool things are- writing a latch (stream that holds the value of another stream whenever a boolean stream goes "true" and holds that value while it is false, figuring out how to divide down streams to slow the LEDs down so you can see them, and tying the languages together to add capabilities to both.
Programming a little DSL like this is a fun puzzle (like puzzle-script) and I can appreciate the added safely of using a constrained language. The concept of a stream is a pretty accurate model of a lot of embedded systems, and I love being able to describe things directly rather then building up the same concept indirectly in C.

FreeRTOS:
My next attempt was to use a port of FreeRTOS (http://code.google.com/p/rtoslibs/). This was fun, as the application structure can be made much nicer and we can express things like blocking directly. Its also fun to have a little operating system on my Arduino. I have an ISR giving a semaphore, a high priority task taking that semaphore, shifting out an output and periodically giving a semaphore, and a low priority take blocking on a semaphore and calculating a new output. This structure works well, and I would like to play around with it more.
The problem with the Arduino Uno is its tiny tiny memory. Tasks and semaphores and scheduling and other mechanisms take up space, and there is not much to be had here. So far I still have 1343 bytes left, which is (relatively) good. A couple more tasks, a message queue, and a data structure and I'll use it all up, though.

There is more to come with my Arduino. I have lots of plans.

Tuesday, June 18, 2013

Vault Library

The Vault packet on Hackage by Heinrich Apfelmus is quite cool. Type-safe, persistent storage of arbitrary values!

A vault is a map from keys to values, but the keys are parameterized by the type they return. This allows one to retrieves values from the vault that have any value without specializing the type of things the vault stores. How this works behind the scenes is quite interesting, but basically the keys are what store the values in a sense- they store an IO action of type IO () that stores a value in an IORef when it is run. The key keeps a unique value indicating to the vault which IO action to run, and the IORef that is updated. The is possible through the "scary action at a distance" described here: http://apfelmus.nfshost.com/blog/2011/09/04-vault.html.

I think this may be a good choice to implement a modular evolutionary algorithm framework, since it lets individual components (genetic operators for example) set and retrieve configuration information without having to add their desired configuration to the type of the configuration structure (which otherwise seems inevitable).

Discrete Optimization Course at Coursera

I've just started a Discrete Optimization course on the Coursera website. It looks interesting, and I would like to have a firmer understanding of more traditional optimization, given my interest in biologically inspired optimization algorithms.

I'm also looking at a course on analytic combinatorics. My understanding of combinatorics is weaker than I would like, and my understanding of analysis is much weaker then I would like. I'm hoping a class like that would do me some good.

I am hoping this class provides some topics for this blog. Looks like the first problem they talk about is the knapsack problem.