Tuesday, January 31, 2012

Turing's Cathedral

I've just watched a google talk called Turing's Cathedral, and there was a part I thought was cool enough to record here. The speaker mentions the possibility of different types of computation besides the sequential access of instructions that perform actions.
The problem he mentioned with this approach is that the machines are so fast we need techniques like recursion just to provide enough instructions for them. I never thought about it that way.

Sunday, January 29, 2012

Evolutionary Algorithm Framework

I've been trying to come up with some satisfying framework/library for doing evolutionary algorithms and possibly other types of stochastic optimization for a while now. Part of the problem is that the space of possible algorithms is huge and diverse, and its easy to see how any design decision restricts the ability to express certain types of EAs. Another problem is that I've not been very organized about this. To combat the second problem I've decided to record some of my ideas before I forget them.

First off is the structure of the program. Should it be a library, like a set of combinators with which one constructs the desired algorithm? This would be nice, but I think ultimately it would be a bunch of individual libraries of functions to build different types of algorithms. The problem is that there is so much diversity in possible structures that one might use that to be truly general we can't say much at all about the stochastic operators and what structure they require of the population or individuals.

This leads to the second possibility, which is some definition of EAs in particular as a series of constraints. This would mean a class of some sort describing operations necessary to describe an EA. Part of the problem there is that it is hard to see what these really are, and how they are used to build an EA. This is true for two reasons that I've seen, the main one being state. Its hard to say what state will be necessary, and every algorithm might need slightly more or less than any other. The other reason is that even within GAs its hard to exactly specify the general concept of mutation, for example. It can mean several different things in different techniques, and the more general the definition the less it really says. This line of thinking always leads me to ideas that don't really help the user at all. They say the more abstract you go the more work you need to do to get something concrete, and I agree.

The other idea is to be more explicit about what I want, rather then getting caught up in generality. This would mean defining what I believe an EA really is, and then working on that problem instead, where hopefully the extra structure gives some inspiration. So what does an EA consist of? I'm my ponderings I have often thought that it is important to separate the parts out as much as possible to prevent unnecessary couplings between them. For example, the individuals should not "know" how to mutate themselves, in the sense that it shouldn't be part of their structure. One problem with that sort of coupling is when you go from, say, a regular GA to a GEP, where there are several mutation operators. Perhaps you could say mutation is the composition of each of the several operators, but this still doesn't help much when you want the same structure to mutate in different ways in different algorithms or different times in an algorithm.

Starting with the notion of a population, what sort of thing do we have? It would be easy to say that a population is a *set* of *lists* (or fixed length vectors) of *symbols*, but there are situations in which all of these things do not necessarily hold. It would be nice to include linear or grid populations, individuals with multiple "domains", and things like number or arbitrary structures rather then just symbols. Instead we might say that populations and individuals are combinatorial species, which is very generic, that they satisfy some predicate (are members of some classes) that describe "enough" structure to do EAs, or that they are members of some specific data type which should be defined to have as many possible structures as possible.

The last idea requires some single type, recursively defined, that covers enough ground to express the usual sort of EAs (I don't expect to cover all possibilities- I'm not sure thats really possible in a single library without generalizing to the point of uselessness). This would be something like "data Structure = R Double | Integer Int | Str String | Linear [Structure] | Pair Structure Structure | None", or something like that.

The idea of doing this with species is always intriguing, but I don't know enough about them to know if this is useful or approachable. This gets at the main problem here, which is that EAs are so general that we really need a general theory of container types to work within. The reason that this is a problem is that such a theory is a subject of study in its own right, and I don't know much about it.

Okay, I guess thats enough of that for now. Sorry about the rambling and aimless post, but I want this to go somewhere eventually, and I need to start researching and organizing my thoughts.

Some Combinatory Logic

I've been very much into combinatory logic recently- both the Illative Combinatory Logic iXi and SKI. This post just records some things I've been thinking about.

First off, the church encoding of natural numbers in SKI. In the lambda calculus 0 = (\f.\x.x) and 1 = (\f.\x.fx). Translated to SKI these are 0 = KI and 1 = S(S(KS)(S(KK)I))(KI). It seems strange that 1 is so complex and 0 so simple, but even though I did the translation from lambda terms to SKI by hand, checking it with an online reducing program (http://pages.cpsc.ucalgary.ca/~nicholss/research/cl_web/cl_web.php) shows that it is correct. I remember from using this translation (the one given on the wikipedia page on combinatory logic) that the terms resulting from these translations will be pretty long. In this case, lets find a term that does the same thing, but is shorter.

To start off, lets look at numbers as a specification. This is often how I start out when trying to define a combinator. For 0 we have 0fx = x, which KI definitely fulfills, and for 1 we have 1fx = fx. It should be pretty clear that I is just as good as S(S(KS)(S(KK)I))(KI) as the term representing 1 as application associates to the left, so Ifx can be written (If)x and reduce to fx. I'm not sure that this generalizes in any way (to give simple terms to each natural), but it is cool that the naturals start out so nicely with KI and I.

I've also been reading more on ""Systems of Illative Combinatory Logic Complete for First-Order Propositional and Predicate Calculus" and I've understood several more pieces. First off, I can see how the G combinator defined as (\x.\y.\z.Zx(Syz)) (I'm writing Z for the letter Xi) is a dependent product. It says that for some types x and y and some term z it gives the type of things that give for all things "a" of type x, the thing za in the dependent type ya. In other words, it points out elements of a dependent type depending on the given term.

I'm still struggling with a couple of things. One is exactly what types are and when they are inhabited, as the notion seems looser than that of a usual type theory. Another thing is the assertive interpretation of terms. For example, implication is given by (\x.\y.Z(Kx)(Ky)). This says that for all "a", if Kxa then Kya. Since Kxa reduces to x and Kya reduces to y, this says that if x then y. The problem I have is that it also seems to say that all true propositions are inhabited by all objects. I can understand this when the type is T (true), the unit type, but it seems odd for an arbitrary proposition. What I have to guess is that this is saying that propositions are truth-valued (possibly related to proof irrelevance?). Then all true statements are the same as the proposition T, which very well may be the case. Unless I'm completely lost, we should be able to take some inhabited type p and say Kp, saying that it is true as an inhabited type is a true proposition. This points out some of my confusion with this system, as my intuitions about what is what don't seem to apply.

I still haven't read through and understood the whole thing, but I'm actually making progress with every reread.

Saturday, January 28, 2012

Composition and Concatenative Languages

Two of the most basic things one might do with a function is to apply is to an argument and compose it with another function. Application is a function app:(a->b)->a->b, while composition is compose:(a->b)->(b->c)->a->c.

In Haskell the whitespace character indicates application, so "f a" is f applied to a, and "f a b" is f applied to a, and "f a" applied to b. This works since functions are automatically curried, so if f may have the type A->B->C with a:A and b:B then f a b : C. In a concatenative language, whitespace indicates composition, and arguments such as constants are considered functions, so for example the number 1 is a functions ()->N, a function of no arguments returning a single number. Then an expression "a f" has type ()->A composed with (A->B)->B, resulting in a function of type ()->B.

Interestingly, a concatenative expression can be translated to applicative form very easily- by translating it into a series of compositions of functions that take continuations. I imagine that this would make it pretty easy to typecheck them by translating them to lambda terms and type checking the resulting term. I'm not sure it works out nicely in all cases or in the presence of more complex expressions, so this is an investigation of the concept. The reason I'm skeptical is that I've seen some people discussing the difficulties in typing such languages, so I doubt that putting them in continuation passing style is enough to solve these problems.

I'm particularly interested in either pure functional concatenative language and simple ones that can easily be used to generate code for, say, a microcontroller. The syntax makes it easy to do lexical analysis, the semantics are particularly easy to implement in a forth-style runtime, and more complex runtime systems are possible.

A simple example of a program in a concatenative language would be " 1 2 +" which evaluates to 3. Translating that to a lambda term (using feature like numbers we assume are either added or translate into lambda terms) we could get "(\f.f 1) o (\f.f 2) o (\f.\y.\x.f (x+y))" with "o" as function composition. This expands to "(\z.(\f.f1)((\f'.f'2)(\f''.\x.\y.f''(y+x))z))" if we alpha-convert the symbols "f" to f' and f'' in the second and third term to make them distinct. We could have expanded it differently, but it would have reduced to this. Reducing further requires we give a function as the continuation of this new term. To get it to reduce the best term is one that add the least new informations. As application of lambda terms has a left identity, the identity function (\x.x), this is the term to apply, giving "(\z.(\f.f1)((\f'.f'2)(\f''.\y.\x.f''(x+y))z))(\x.x)" => "(\f.f1)((\f'.f'2)(\f''.\y.\x.(\x.x)(x+y))" => "(\f.f1)((\f'.f'2)(\f''.\y.\x.x+y)" => "(\f.f1)((\f''.\y.\x.x+y)2)" => "(((\y.\x.x+y)2)1)" => "2+1" => "3", which is the intended result.

This translation seems to deal nicely with functions that do not result in single results. A term that is only partially applied like "1 +" results in a function requiring an extra argument. Going the other way, terms like "1 2" result in something that requires something to reduce them, nicely dealing with what is usual implemented as a stack by a function requiring a function to reduce the extra arguments. This only works out so nicely if we curry all functions, which we can do without loss of generality.

To flesh this out, lets define some of the usual terms in these types of languages- terms like dup, drop, swap, tuck, nip, rot. Okay- drop is (\f.\x.f), or K, dup is (\f.\x.(fx)x), swap is (\f.\x.\y.(fy)x), tuck is (\f.\x.\y.((fx)y)x), nip is (\f.\x.\y.fx), and rot is (\f.\x.\y.\z.((fy)z)x).

I'm not sure that this works for some complex terms, but it does seem to deal correctly with things like "quoted" terms, usually written as "'+" indicating the function + as an object to be placed on the stack. Such terms are just like other constants, in this example (\f.f(\y.\x.f(x+y))). This is exactly the "return" or the unit of the continuation monad, which takes an arbitrary term turns it into a continuation, which is of type a->(a->r)->r for a continuation of type r.

Maybe more on this later, especially if it turns out to not work in some situations. The application I'm thinking of here is to either translate such terms into lambda terms and allow Haskell to type check them, or to use them as an exercise in type inference in the nicely familiar setting of the (polymorphic) lambda calculus.

Wednesday, January 18, 2012

What is Math Really About? part 3- computation

Whether we are looking at the theory describing a simple phenomenon, or trying to found all of mathematics, we will want to know some things about our ability to investigate the formal theory we have chosen to study. It is natural to ask what things can be proven and dis-proven (questions of computability), and if something can be proven, how long a proof will its proof take (a question of computational complexity). This is the sort of question that began computer science as a distinguishable subject in mathematics, and all systems of computation (Turing machines, grammars, the lambda calculus and its type theories, mu-recursion, etc) can be described as rewriting systems exactly because what they are intended to describe is rewriting in math. It is not possible to prove that these systems describe all things that can be computed because they are the definition of computation (in other words, that is not something to which the notion of proof applies), but in practice the systems mentioned above are accepted for two reasons: a huge number of other systems have been shown to be weaker, or at most as strong, as these systems, and stronger ones (systems of super-Turing computability) don't seem to capture the notion of computation at all.

In some sense computer science is the constructive part of math- that part concerned only with objects that can actually be constructed where truth comes from an explicit proof. This is in contrast with classical math where something can can be "true" in some ultimate sense without being able to give a proof (for example, its possible to prove the existence of an object with some property without being able to describe any particular object with that property). This bring math down to earth- we can't talk about things too large to do computations on, we can't talk about objects that we can't construct.

What is Math Really About? part 2- models

When we try to describe a situation in math, we come up with some collection of objects that give all relevant information to understand it. To describe a family tree, we might want a set of people (the family) and a set of triples (an ordered group of three things) where the first in the triple is the mother, the second the father, and the third the child. The individuals in the set of triples are from the set of people, of course. There are some other constraints, such as that a person can't be their own parent, but lets not go into too much detail. In the definition (the theory) we don't want a particular set of people, or particular set of triples, but rather the notion that a family tree consists of people and parents. There are other ways to do this, but if we chose this way then a description might be "A family tree consists of a set F (called the family), and a multi-relation FxFxF. A family is given by (F, FxFxF) such that for all (a, b, c) in the set of triple, a, b, and c are distinct.".

While studying these objects, we will look at their properties. This may include ways of ordering them, combining them, and describing parts of their structure. That one of the most important things to study is how to translate the objects into each other, especially when the transformations preserve some aspect of the structures. One of the reasons such transformations are important is because they can be used to describe properties (often by some single objects that embodies the property in some way, and then the transformation to and from that object and others). Because of this, a theory will generally have some objects it is concerned with, and some notion of transformation that is appropriate and interesting for that object. Remember that in math we take situations we find interesting and encode them as theories, so there should be theory that describe this situation of having objects and transformations and combinations of these objects. This theory would be pretty fundamental and would have far reaching consequences- recall that the theory of symbols is logic, and we are talking about the models of these theories (the objects they describe). This theory is called Category Theory. A category is exactly what we just saw- some type of object, and some appropriate way to translate or transform between objects. In Category theory we tend look not at the internal structure of the object (which is based on definition or interpretation) but at an objects relation to other objects, so we often give properties as objects and some class of transformation between them. Category theory has pretty clear place in math from this perspective- it generalizes the notion of model. This makes it less surprising (although I personally found this immensely surprising) that categories have something call an "internal language" that is the logic they describe, and logics have an associated category.

I am not a mathematician, so take my generalizations about the importance of Category Theory with a grain of salt- I'm not sure its the amazing paradise I've made it out to be. However, it does describe an important situation, it is extremely abstract, and it is a great deal of fun.

So thats all for part 2. The last thing I'm going to go over is the manipulation of symbols in math, which, as a computer scientist, is really my place in all this.

What is Math Really About?

I often find that people are confused about what math is or what its about. This is understandable- I only came to understand this late in my college education, and I was very surprised by what I learned. Notice that I am not a mathematician- I'm a computer scientist. What I want to share is my experience and knowledge having learned this myself over the last several years. I'm going to talk about what math is, what it does, and how its used. I want to express that math is not one big unified system that explains the universe, its not all about numbers, that subjects like calculus are not "higher math", and math is not even one thing.

So, what do we do in math? The basic situation is that we observe something like a relation between people (a family tree for example), a physical phenomenon (gravity, electricity, and lots of other things), or something as simple as a quantity of some object. Having found something we want to understand, we construct some mathematical object that describes the situation. Instead of describing a single family tree, we would describe a general structure of all family trees. To do this, we have a large number of objects to combine from the usual practice of mathematics. Often we need many objects collected together to describe our situation that together give the information we need to describe the object we are interested in. The information we need give the "theory" of the thing ("the theory of family trees"), and the particular objects the models (your particular family tree).

This begs the question- where do all those objects we use come from? They in turn encode some intuition about the world- relations encode relations between things, numbers encode quantity and order, lattices encode complex orderings, etc. In other words, we can combine these objects to describe a new situation, but how do we arrive at the "first" objects- how do we found our system? The answer is that there are many ways to do this, and they are *not* all the same. Found math in different ways, and you get different results. The reason that this is not as much a problem as it might seem is that for the most part we work in some "ambient" system of logic with some notion of sets. While some of the more abstract and obscure (but still important) parts of math may change depending on the system, there is a large part of it that is generally (but not universally) agreed on. The usual system to found math on is called ZFC, and the usual logic to use is classical logic, but just as a writer doesn't wonder about how transistors work when they use a computer, a mathematician doesn't necessarily worry about how math is founded when doing their work.

While though there are many different foundations for math (category theory, set theory, logic, and more). there is some similarity to them. This brings us to the heart of math- writing down symbols that follow rules. Thats really it. We write down symbols, rewrite them according to rules, and interpret them as having a meaning. I said that when we see something interesting, we try to encode it as a mathematical object. Therefore, we should be able to encode the notion of encoding notions as an object, which we call a theory. Logic is the study of these systems of symbols and their rules, so logic is math. Notice that there are many systems of logic, which vary hugely in what they can describe. They can be very strong, and define all of what we think of as math, or very weak and be able to describe nothing, or anything in between. These systems usually will have some meaning to us as humans- they give a systematic way to write down and manipulate symbols, and therefore to write down and manipulate the concepts that the symbols are intended to stand for. Ideally, if two people agreed on some logic- the meanings of the symbols and what it means to combine the symbols in different ways- they would agree on anything that could be defined using that logic.

This brings me to the end of the first part in this series of posts. I still want to talk about models in general, and the mechanical process of writing and rewriting symbols, but the main point is here: math is logic, logic is writing symbols (often on chalkboards). The meaning comes from the person doing the writing, and what they believe in. You can refound math any way you want, you can construct all sorts of exotic objects, you can go into the highest abstractions, but if there is no reason or justification for your work then it is not interesting. This is why there is no "problem" with the fact that there is no number 1/0, or that are many sizes of infinity, or with the various things commonly called paradoxes. Even if you could come up with a construction that does not have these "problems" (it would be more accurate to call them properties), it is not necessary very interesting or useful. If there is some application, or even if it just help people understand something that is already well known, then it can become a part of the huge, distributed effort that we call math, even if it is a new mathematics based on different axioms (constructive math for example). The activity of math is more a way of thinking then a single unified system, and if it describes the universe as we experience it, then its because it was designed to.

Wednesday, January 11, 2012

From p=0.5 to Arbitrary p

The blog A Neighborhood of Infinity- which is probably the best blog I've ever seen- has a new post discussing exactly what I was going to discuss here, except with a much better technique and with ties to some interesting math. Therefore, I defer to this awesomeness by providing a link: http://blog.sigfpe.com/2012/01/lossless-decompression-and-generation.html

Apparently there are ways of doing this in constant time, but the technique described in the above article draws a connection between probability distributions, Huffman encoding, algorithmic complexity, and entropy.

What these posts had been leading up to was two things- given such a technique like the one mentioned above, we can generate a single number in the distribution describing the probability of getting each number of mutations for in a population (from 0 to p*n with p the size of the population and n the size of the individuals). Then we need that many random numbers to pick out places to mutate. This is much, much more efficient then the naive way of generating a random number per location (which I see used a lot).

The other point I was leading to is that we can get a distribution of 1s in a random number besides 0.5 by using AND, OR, and XOR on evenly distributed bits. This means is we want some distribution with p probability of a bit being a 1, and we start with samples from p=0.5, some fixed combination of samples will give us the desired distribution within any desired accuracy. This can be seen either with numbers in [0, 1] (continuous) or with sets of outcomes a subset of which are considered successes (discrete). The operators in the continuous case are: AND is pp' = p*p', OR is p+p' = p + p' - p*p', XOR is pxp' = p*(1-p') + p'*(1-p) - p*p', and NOT is -p = 1-p. In the discrete case we have the size of a set, n, and the number of successes, k, such that k <= n, so I'm going to write (n, k) has a probability of success k/n. The operators are then: AND is (n, k)(n', k') = (n*n', k*k'), OR is (n, k)+(n', k') = (n*n', k*n' + k'*n - k*k'), XOR is (n, k)x(n', k') = (n*n', k*(n'-k') + k'*(n-k)), and NOT is (n, k) = (n, n-k). There is presumably some way to determine the smallest number of operations starting with (2, 1) and ending with (n, k) such that p - n/k < e for any error e, although nothing comes to mind as how to find it.

Thursday, January 5, 2012

The Von Neumann Extractor

We can always do mutation efficiently in a Genetic Algorithm if we could generate a random stream of bits (machine words, really) of some arbitrary distribution equal to the probability of mutation by xoring the stream and the individual. In other words, we want a sequence of Bernoulli trials for some probability of success p, rather then fixed as a coin flip as a random number generator will give you. First lets look at the Von Neumann Extractor, which allows you to do the opposite- given a stream of bits where the probability of a 1 for any given bit is some probability p, it generates a stream where each bit has a 50% chance of being a 1.

This is a pretty simple technique. We take the stream of bits, and pair them up. For the stream starting with 001011101001, we pair up as (00)(10)(11)(10)(10)(01). Then we drop all (11)s and (00)s, and drop the second bit from all remaining pairs. This turns our resulting stream into the smaller stream 1110. In this example there happen to be more 1s then 0s, but the probability of 1 is equal to the probability of a 0 given some fixed original probability (assuming each trial is independent).

To see that this is true, recall that for a pair of bits, the probability that they are 10, with p as the probability of a 1 and q = 1-p, the probability of a 0, is equal to p*q. The probability of 01 is q*p, and by the commutativity of multiplication these are the same. This means that it must be just as likely to get 01 as 10, and since we drop the second bit, just as likely to get 0 and 1. Obviously we lose some of the stream in the process, but thats really not a big deal.

Okay- so we can take a stream of trials with probability of success p and turn it into one with probability success 0.5- in the next post we will look into going the other way.

Wednesday, January 4, 2012

Illative Combinatory Logic

I've been reading the paper "Systems of Illative Combinatory Logic Complete for First-Order Propositional and Predicate Calculus" after looking over posts on an awesome blog: http://lukepalmer.wordpress.com/, and I have to say its pretty cool. It took a while for me to understand even the simplest notions of these systems, but the paper is a good one and the discussions on the blog show examples of inference and interpretations in this logic. For reference, I wanted to record some of the things I've learned even though I haven't finished the paper.

Application can be thought of as "type of" so Xa means a has type X. This means that types are predicates (presumably allowing some form of set theory), yet are still terms, apparently collapsing the type hierarchy without a problem. Application can also be thought of in the usual way, as reduction works as in the lambda calculus (the terms of these logics are lambda terms).

The Xi combinator can be thought of as the subset combinator- which I'm going to write as Z here. ZXY can then be seen as saying that X is a subset of Y, or if we think of predicates, then for all "a" such that Xa, Ya. This gives rise to implication in a way that I finally understood today. Given the combinator K, which as a lambda term is (\x.\y.x) with "\" standing for lambda, the term Z(KX)(KY) means X => Y. This is not a function from type X to type Y, which is instead written ZX(Y . f) to say f : X -> Y with "." as composition (\f.\g.\x.f(g(x))) (I may need to stop with the ascii art symbols and start putting in real ones)) with says that for all x such that Xx (x in X), Y(fX), so fx in Y. Back to the term Z(KX)(Ky), notice that this means that for all x KXx reduces to X and KYx reduces to Y. This is then the statement that if X then Y, which is exactly what we wanted! My jaw literally dropped when I realized that this is what the interpretation is (I hope I'm right) and I had to just sit there for a second and chill before reading on.

To get a universal quantifier, we simply do a lambda abstraction over a term and make use of Z. To say that for all x in X, Px we write ZX(\x.P a).

To get the Universe combinator U (the set of all things) we simply write K(True). This term reduces to True for all arguments, and therefore all terms are of this type. This means that we need some value for True, however (not that this is unexpected). First off, the system in question has a combinator H saying that something is a proposition, and so ZHH is True. As I understand it, this is saying that the statement that something is a proposition is a proposition, which is surely the case.

There is still more for me to say about this system. I still don't understand the product and coproduct construction given by Luke Palmer, and I haven't finished the paper, so I haven't seen how it interprets logic yet. This is cool stuff, and I find it all surprising for some reason. It seems to stretch my intuitions, which is always fun.

Natural Transformations

A quote on composing natural transformations from A Neighborhood of Infinity:

If you know how to convert a bowl of stuff into a box of stuff, and a box into a bag, and an urn into a crate, and a crate into a sack, then there are two equivalent ways to convert a bowl of urns into a bag of sacks: we can either construct methods going from bowl to bag and from urn to sack and combine them to go from bowl of urns to a bag of sacks, OR, we can go from a bowl of urns to a box of crates to a bag of sacks.

http://blog.sigfpe.com/2008/05/interchange-law.html