Tuesday, June 18, 2013

Vault Library

The Vault packet on Hackage by Heinrich Apfelmus is quite cool. Type-safe, persistent storage of arbitrary values!

A vault is a map from keys to values, but the keys are parameterized by the type they return. This allows one to retrieves values from the vault that have any value without specializing the type of things the vault stores. How this works behind the scenes is quite interesting, but basically the keys are what store the values in a sense- they store an IO action of type IO () that stores a value in an IORef when it is run. The key keeps a unique value indicating to the vault which IO action to run, and the IORef that is updated. The is possible through the "scary action at a distance" described here: http://apfelmus.nfshost.com/blog/2011/09/04-vault.html.

I think this may be a good choice to implement a modular evolutionary algorithm framework, since it lets individual components (genetic operators for example) set and retrieve configuration information without having to add their desired configuration to the type of the configuration structure (which otherwise seems inevitable).

Discrete Optimization Course at Coursera

I've just started a Discrete Optimization course on the Coursera website. It looks interesting, and I would like to have a firmer understanding of more traditional optimization, given my interest in biologically inspired optimization algorithms.

I'm also looking at a course on analytic combinatorics. My understanding of combinatorics is weaker than I would like, and my understanding of analysis is much weaker then I would like. I'm hoping a class like that would do me some good.

I am hoping this class provides some topics for this blog. Looks like the first problem they talk about is the knapsack problem.

Haskell, Netwire, Gloss, and Swarms

I've been playing around with the Haskell libraries Netwire and Gloss. They are both very nice- Gloss is an incredibly easy way to get something graphical up and running, and its been easy to integrate it with Netwire to get an FRP style simulation going. I've been playing with swarming mechanics, trying to figure out what makes a good looking swarm (and trying to make sure that swarm looks good even when its doing something like following your mouse cursor).

In case your wondering, my choice of Netwire vs some other library for FRP in Haskell was mostly motivated by it being simple. I understand how it works, and I've not had too much trouble getting something small written with it. I'm not going to comment on how it compares to other libraries in any other way but to say that I found it approachable.

The concept behind the program is that a swarm is a list of boids, each with a position and velocity. The position is the sum of the velocity at each time step, and the velocity is the sum of each acceleration at each time step. The accelerations are produces by operators that are given the current swarm, and produce an acceleration for each individual. For example, the cohesion operation takes the centroid of the positions, and produces an acceleration for each individual in the direction of that centroid. The random operation simply produces random accelerations.

This is admittedly not the easiest way to program this kind of thing, but I like the idea of FRP and I want to incorporate it into something larger. Getting started on something small like this has helped me understand how to use FRP (to specify the value of something over time) and familiarized me with Netwire, the FRP library I'm currently using. Hopefully on a larger scale this approach will pay off by letting the components be more modular (including having local state) and the overall flow of control more clear since it is specified by a single abstraction (a flow network).

Good times are ahead, and I think the time has come to wake this blog up.

Wednesday, February 20, 2013

Idris is Pretty Cool

I've been reading recently about a programming language called Idris. Its similar to Haskell, but with dependent types. It is intended to be a practical programming language rather then a theorem proving language or a proof of concept, and there are papers about applying it to systems programming. That is one of my interests in dependent types- increasing the safety of complex programs that involve things like communicating with hardware. I use C to do this professionally, and I find that it is difficult to get this kind of programming correct every time with all the complicating factors. There are often times that I could express more about the structure of data or the flow of the program with a better type system. Dependent types have a lot of potential, and I would love to see them make their way into the kind of work that I do.

I may have to post about dependent types in the future, as they are probably the biggest thing that I know about in the functional programming community right now. They represent a real change in the use of programming languages and how one thinks about programming, and I for one am in favor of this change.

Arduino LED Language, part 2

The LED language for the Arduino is still changing quite a lot. I've gotten rid of the whole cooperative multitasking thing in favor of a simpler interpreter that is now just a switch threaded Forth. As it gets more advanced I'm faced with some problems- whenever one designs a domain specific language there is the problem of how much power to give it. Right now it does not support variables (constants are supported but not built in) or any sort of parsing words or anything interesting like that. I'm running into the problem that Forths have on the Arduino, which is the problem of having your program stored in flash and running it out of RAM. I will probably just add some simple hacky variable system to keep things from getting out of hand complexity-wise (indices into an array that get expanded to accesses at compile time or something).

In terms of progress it has conditionals and loops, and supports execution tokens (the Forth term for this idea) aka function pointers aka almost higher order functions (with no combinators to construct new ones at runtime). Unfortunately it does not support getting the execution token of built in words, which is a pretty arbitrary distinction.

Incidentally, I have it working with a full 5 RGB leds, controlling all 15 pins nicely. More things need to be added to the language to get complex animations, but as it is one can fade leds, set colors, and having sequences of blinking lights.

Thats all the updates I can think of for now.

Thursday, February 14, 2013

LED Arduino Language

A friend of mine has come up with a project involving RGB LEDs where it would be cool to be able to "script" their color, possibly animating them over a longish period of time, with multiple possible animations. To do this, I've come up with a system that is essentially 15 cooperative multitasking threads interpreting a stack-based language that I've also embedded into Haskell for some meta-programming goodness. Together, these threads control the intensity of each color of each led (we plan to have 5 of them, 3 colors * 5 leds = 15 pins to control (intensity is controlled by pulse-width modulation controlled through an interrupt attached to a timer)). This post merely records the fact that this idea actually does work, and I can control the color of the first RGB LED attached to my Arduino on my desk here.

Pretty neat, though there is much work to do. I need to make sure the idea scales, possibly optimize a bit, get the embedding into Haskell nicer, make some more complex animations, add more to the language itself, and possibly write a parser and make it a standalone language to make the scripts look nicer. I would also like a mode that allows the user to control the colors, so I need to figure out how to do some input 9the programs are read from flash with no input from the user at the moment).

Tuesday, February 12, 2013

Abstraction, Information Complexity, and Diversity of Programming Languages

I've been using some very underpowered languages at work, one of which is domain specific and doesn't have such niceties as user defined types or real functions/procedures/subroutines. I end up writing very repetitive and error-prone code and I can't think of any way to factor out commonality or provide abstractions to make my life easier. I'm also working on a project with the arduino involving controlling a lot of constantly changing values (colors of RGB LEDs) where I've worked out a simple language so that each pin of the arduino is controlled by one thread of a small cooperative multitasking system with program written in this little domain-specific language (one which is becoming more and more like Forth). I came up with the idea of expressing the changes in LED brightness (which controls color on RGB LEDS) in the form of a language when I realized that I needed a compact representation of the sequence of values needed for control. The simple language has compressed the size of the data necessary to store complex animations by around a factor of 100 (at least, possibly much more).

What all this lack of abstraction in some language and the use of a domain specific language to compress data has made me realize is that code that is not abstracted contains a lot of redundancy, and that the less repetition in a program the more information is carried in each character. This may seem obvious, but the funny conclusion is that a very well factor and abstracted program is close to random in a certain information theoretic sense (Kolmogorov complexity of generating the program's text). This is mostly just amusing and certainly simplifying the situation, it does explain why my LED language compresses the data so much- I got to choose the machine/language used to express the information, so I could construct one that happens to have a concise representation of the particular type of data I needed.

It also explains (to some extent) why highly abstract programs can have a higher mental overhead to understand- there is more information contained in each line, even if the full amount of information in the program is equal to an equivalent but poorly written program (or one in a simple language). This suggests that there is an advantage to choosing abstractions that have some underlying consistent concept or model when possible rather then just abstracting the observed common case in a program. The ability to understand an abstracted program depends on the ability to understand the abstractions, so well chosen abstractions that are grounded in theory and well understood can lead to programs with a firm foundation where code can be expressed concisely according to an understandable mental framework rather then one that is built adhoc. There is a danger when making things too consistent (the whole "everything is a _" style) when a problem doesn't fit that framework well and has to be awkwardly encoded in it, but in general there should be problem that any language has trouble encoding. This suggests that a languages ability to construct DSLs is important in its ability to express appropriate abstractions and keep the information content of programs high. Since this language is simple, it is not easy to program in. To solve this problem, I embedded it into Haskell and wrote a tiny code generator, giving me all sorts of meta-programming abilities. The domain specific language encodes its domain concisely, but lacks power, and the language its embedded in lacks the ability to express the domain directly, but together they are a very nice duo.

My conclusion in all of this ranting is that a diversity of languages is a good thing as different languages express different domains more closely and are able to encode its concepts. There is tension here when a languages focus is too narrow and it can't express anything outside of one domain or one way of looking at its domain. Languages that can easily have other languages embedded into them have an advantage in usability as general purpose languages as they can cover many problem domains and reap benefit of encoding expressions in languages specific to those domains. They can encode domains in multiple ways and can be adapted to many situations. They may not have all the benefits of a more specialized language in some cases, though they will have the benefit of transferring whatever power they have to the domains to which they are applied (which may otherwise get less powerful but fully domain specific languages applied to describe them). For Haskell, there are many ways to embed languages, and certainly languages like Lisp (or Forth, incidentally) excel at it. The functional programming community in general is comfortable designing and implementing languages, and I think that may be one of its biggest strengths. I personally prefer languages steeped in type theory, and I recognize that my language of choice is restrictive compared to most languages (and we like it that way, gosh darn it!), but I wonder if there are language with built in faculties for constructing domain specific languages that are based on formal language theory. I'm talking about specific, built in abilities to define and manipulate things like ASTs (sexprs fulfill this), grammars, operational semantics, or something like that. I feel like either this exists, or is just more easily done when encoding in the constructs of an existing language like Lisp, but I can't help but wonder.

Alright- enough rambling for now.

Wednesday, February 6, 2013

Logic = Syntax + Semantics

I've been reading the Model Theory notes from: http://www.cs.man.ac.uk/~hsimmons/BOOKS/books.html. The very first sentence of the lecture notes that (much of) logic is about understanding a subject through the use of "syntactic gadgets". This made a whole series of things click for me- the kind of thing where I feel like I almost knew this beforehand, but never quite made the little leap until now. I like these moments, so I will now record what I've learned. Warning- I'm not a logician or mathematician, take what I say with a grain of salt.

When we use logic to study a subject, the logic does not necessarily cover the entire semantic domain. Often we can only carve out some subsection of our domain and reason about it through a logic, especially if that logic is to be simple. This is not necessarily a bad thing- carving out total subsets of untyped lambda calculus by type theories (slightly different from whats in the lectures, but corresponding ideas) is a common pass-time in computer science. Being able to talk about a subject purely syntactically has several advantages to studying it directly. It is a common language which can be studied in isolation. This lets us study properties such as computability of, say, first order theories and apply the results to a large class of situations. It also allows a subject to be transformed into the unambiguous language of symbols. If that language is intended to describe a real-world phenomenon, like situations where the truth of something changes over time or where truths are added as proofs are discovered, then encoding it in a logic and finding a semantics for the logic gives the encoding of the real-world situation in math. This is what gives logic that truly foundational feel to me- over systems can be taken as a foundation, but logic has to do with symbols, and any other system will be manipulated and reasoned about using symbols written on chalkboards (or other surfaces, I suppose). This also applies to computer science, where I think of type theory as being "more foundational" (subjectively) then other systems.

This makes a lot of things clearer to me. Logicians like systems that are sound and complete with respect to their models because then the logic can be used to reason about the semantics without problems in the logic or in the semantic domain. It also makes more sense why Kripke models are so important- they give a mathematical setting for the subject of interesting logics.

Thinking about this in the context of constructive logic lead me to what I think is a better understanding of why domain theory is set up the way it is. I may be completely wrong about this, but what I'm thinking is that in constructive logic there are more then two truth values. It is not that there are three- which would be a tristate logic- but more like there are a countably infinite number of truth values. The truth value "true" is at the top, "false" is at the bottom, and all the rest are, in some sense, between them. The sense in which there is something between true and false is that a value may be partially defined. If a data structure can be evaluated part of the way to normal form it is more defined then one that has not been evaluated. It may not be fully evaluatable, but there is a real sense in which it is "more evaluatable" then a structure that can be evaluated less then it. This means that there is an ordering (a partial order) of structures by how "true" they are, or how much they can be evaluated before becoming completely evaluated or being equal to bottom. There is probably a mistake in my understanding here, as this ordering is over the terms, not the types, but the types are the equivalent of logical terms, but I don't know enough to be sure where my mistake it.

This also firms up my understanding of the relationship between category theory and logic. I've seen before how categories have internal logics, but when talking about things like translating syntactic objects to semantic ones, it clearer to me now how one might define a syntactic category and then one or more functors to a semantic category. I will have to go back to reading "Categories for Types" to see if I understand more now.

Again, don't take this post too seriously- I'm still learning and I have no formal background in this subject. My interest in it has carried me pretty far in understanding, but there are lots of details I get wrong still. Interesting stuff, nevertheless.

Tuesday, January 29, 2013

Stochastic Uniform Sampling

I'm not a big fan of roulette wheel selection. I've seen it remove the diversity of a population much to quickly for a solution to be found, and it bothers me that my choice of fitness function can have unintended consequences on selection pressure. However, stochastic uniform sampling appears to be a reasonable alternative while still be a fitness proportional selection mechanism. I'm planning on including it in my Haskell GA library (which is coming along reasonably well) along with tournament selection (and any others I come across).

The idea of stochastic uniform sampling (SUS) is almost the same as roulette wheel selection. The only difference is that while with roulette wheel selection the "roulette wheel" is spun N times (with N the number of individuals in the population) with a single "head" that can land anywhere in the wheel, with SUS the wheel has N "heads" and it is spun once (conceptually). The space between each head is the average fitness of the population. This helps to prevent the resulting population from consisting of only a few very fit individuals, as the average fitness would then be very small, allowing many heads to hit other individuals. Since it is fitness proportional, it is still possible to highly favor a small set of individuals, but it should do much better then roulette wheel at this.

Another way to think of SUS is to imagine the individuals on a line. The line represents the total fitness of the population, and each individual has a segment proportional to its share of the total fitness of the population. Now imagine a comb with equally space teeth (where the space between them is, again, the average population fitness). With the comb's leftmost tooth all the way to the left of the line, move it a random amount to the right, where the distance moved is between 0 and the average population size. Now, select the individual that each tooth of the comb falls within as part of the new population.

My GA library aims to allow the user to select different operators and combine them in any way they desire. This means I need to supply some commonly used mechanisms, and stochastic uniform sampling fits in with this very nicely.

Saturday, January 26, 2013

Really Good OpenGL Tutorial

I definitely recommend this tutorial: http://duriansoftware.com/joe/An-intro-to-modern-OpenGL.-Table-of-Contents.html

Its the clearest explanation of opengl I've seen, and it gets into some interesting things in the later chapters. It looks like there was more planned that never got written, but what is there is very helpful.

As a side note, when he talks about matrices and using them for effects, he is implicitly using one of the concepts found in this article: http://conal.net/blog/posts/reimagining-matrices

Conal Elliot always has interesting things to say, and in this case he mentions that multiplying matrices is the same as composing them. This means that to compose linear transformations in OpenGL code, you simple multiple the matrices that define those transformations. Good stuff.

Friday, January 11, 2013

The Next Generation of Neural Networks

There is a Google Tech Talk called "The Next Generation of Neural Networks" by Geoffrey Hinton that talks about some very cool neural network architectures and applications. This guy has been doing neural networks for a long time, and he seems to have some cool ideas on how to use them that he really believes are much better then whats been done. In particular he doesn't have to use back-propagation, and gets to learn features automatically if they are not known beforehand.

The speaker is funny, interesting, and presents some really cool material. I was very impressed by his demonstration of his hand writing recognition program- you will just have to watch the talk to see why its so very cool.

I definitely recommend this talk, though I had to watch it more then once to understand a lot of it, and I'm still unclear on some parts (my failing, not the speakers). One thing that I thought was interesting to note was that in the examples he gives of his new network outperforming other techniques, some of the much simpler algorithms did pretty well. They aren't as good as his, but its interesting that the simplest algorithms can do pretty well in a lot of cases.

Radial Basis Function Neural Networks

I've been working on my genetic algorithm library lately, and after getting RGEP to work again I've moved on to evolving neural networks. Since RGEP is about evolving tree structures, one easy test for it is to evolve equations to match data (symbolic regression). I wanted to see how a neural network would do on the same problem if we evolved its weights using a genetic algorithm, but the usual neural networks aren't terribly good at this kind of problem since their outputs are usually either in [0..1] or [-1..1]. This led me to Radial Basis Function (RBF) Neural Networks, which are a pretty straightforward modification to the usual feed-forward networks with, say, sigmoid activation functions. They have the same feed-forward architecture, but they are much easier to use for function finding.

The main thing to note about an RBF network is that the activation function is whats called a radial basis function, which I will describe below. Other things to note are that the architecture for these networks usually don't have an activation function for the output neurons (in other word, the activation function is the identify function) and there are no weights for the inputs (in other words, the weights are all one). This means that the network is a simple linear combination of the hidden neurons. I'm not sure there is a disadvantage to structuring them differently, except perhaps it will make training harder (as there are more values to learn). Also, its seems like having more then two layer (deep neural networks) can be very advantageous to neural networks, although it makes them harder to train (especially with back-propagation, which apparently has trouble propagating error through multiple layers).

So- radial basis functions. These are functions that have a "center", and whose value depends only one the distance from this point. Note that we could use any distance function we wanted, but euclidean distance seems to be the most popular. The center for a neuron is a vector, which replaces the usual weights. This means that each hidden layer neuron will contain an n-dimensional vector defining a point in n-dimensional space. Given the input vector to the neuron, we first compute the distance between the neuron's point and the point defined by the input. This gives a single, scalar, number, which is then plugged into some equation such as the Gaussian function e^(-beta * r^2) where r is the input and beta is a parameter to the algorithm. Other functions are available.

I found this applet useful to get some intuition about these networks: http://lcn.epfl.ch/tutorial/english/rbf/html/index.html. I like how the introduction of a difference kind of activation function extends neural networks into a difference problem type, and I think these are a nice thing to have in owns toolkit.

Wednesday, January 2, 2013

Generating Initial Population for a Genetic Algorithm with Markov Chains

Sometimes when running a Genetic Algorithm a lot of time is lost when the initial population is random. This depends on a lot of factors, such as the encoding of solutions, or the distribution of good solutions in the solution space. One case that seems common is when some solutions are not in the feasible region, and generating random solutions results in a population containing mostly invalid possibilities.

One way to remedy this is to generate individuals encoding feasible solutions to start with. Something this is as easy as walking a grammar while making random decisions about how to expand non-terminal symbols, but sometimes no such structure is available or solutions known in practice follow a more complex distribution then can be easily describe this way (some expansions may be much more likely then others, for example).

In this situation, one possibility would be to train a Markov Chain on known solutions, perhaps sampled from a physical process, such as in a prepared training set. The resulting Markov chain would generate solutions like those found in practice while still being random.

This is not without drawbacks- it assumes we know something about the solution space and that the samples used accurately cover this space. It will improve convergence time but will not search as much of the solution space as a randomly generated population would. Of course, these things are always about trade-offs, and this seems like a nice tool to have in one's toolbox.

On last note- it seems like this could also be used to generate new values for a training set given an initial set. This might prevent some over-fitting, assuming we are able to evaluate the new samples accurately.