Friday, April 1, 2011

Learning Genetic Algorithms, cont

In this post I will introduce the biological metaphor of Genetic Algorithms, and try to develop a simple example problem. We will see how to encode solutions for this example problem and motivate the parts of a GA.

The sample problem will be trying to find a value, x, for the function f(x) = x^2- the function that takes a single input number, lets say a positive whole number, and returns its square. We want to know what input number between 0 and 1000 (exclusive, so 0-999 inclusive) will get us the highest number as a result. The solution space is the numbers from 0 to 999. It will be convenient to write all numbers with 3 digits, so 0 is written as 000 and 23 as 023, etc.

The answer is obviously 999. If this partially random search of the solution space is to be "intelligent" it should arrive at the same conclusion that we have come to.

Now that we have a problem to solve (the max output of f(x)=x^2), a solution space (the numbers 0-999), and a way of writing down possible solutions as a list of symbols (000-999, which is all possible three symbol words using the numbers 0-9), we are ready for the biological metaphor. The story is that the list of symbols is sort of like the genetic material for an organism. The number that it corresponds could be considered its phenotype- so 023 is the genetic material that gives rise to the number 23 as the phenotype. The fitness of the individual is the result of putting its phenotype (its "body") into the fitness function (here its our squaring function). The result of squaring its phenotype tells us how well it solved our problem because we are looking for the highest output of the function f.

This is a nice little relation between the components of the problem and the situation in nature, but there are still parts missing. In nature we may expect there to be many individuals- in a GA we have many lists of symbols, not just one. Also we expect that fitness isn't just a number- fitness should affect the survival of the individual. It is not enough for fitness to affect survival, as there must also be some sort of variation between generations (which implies some way of creating children). This situation gives us the inspiration for the rest of the system.

Lets outline the situation using these new facts- there is a population of lists of symbols which will be random, our example population will be 900, 002, 299, and 432; there is some selection of good solutions that kills off some bad solutions, there is some random variation corresponding to mutation in natural genetics, and there is some mechanism for creating child, a crossover similar to crossover in natural genetics. We hope that good solutions will be copied many times during selection, that mutation will improve solutions and not make them worse, and the crossover will take two goodish solutions and create really good ones by combining the parts of the solutions that made them good into an individual with the traits to be great.

Starting with the population of randomly generated individuals 900, 002, 219, and 432 we should turn them into their phenotype-900 becomes 900, 002 becomes 2, 219 becomes 219, and 432 becomes 432. Now we can put them into our fitness function so f(900)=81000, f(2)=4, f(219)=47961, f(432)=186624. Clearly 900 is a good (but not perfect solution), and 432 is pretty good too. Now lets select some of the good ones, allowing them to be selected many times so the new population may have several individuals that look exactly the same. I won't go into example how to do this in this post, but lets say we end up selecting a new population from the old (think of the individuals as reproducing) and we get 900, 900, 219, and 432. The 219 wasn't as good as 900 or 432, but the "best" individual doesn't always live in nature and the "worst" doesn't always fail to reproduce.

Now lets do some mutation. This means that we take our new population (900, 900, 219, 432) and just randomly change some digits to random numbers. After mutation the population may look like this: (900, 900, 299, 132) which changed exactly two digits in different numbers. It looks like the 432 that got mutated into 132 got much worse, but that happens sometimes. The 219 that was mutated into 299 certainly got better, which is nice.

We are almost to the end of the first generation! Now we just to crossover. This means that we pair up each individual, and sometimes the "happy couple" crosses their genetic material, and sometimes (again, randomly) they don't. Lets say they are paired like this: (900, 299) and (132, 900). Lets assume only the first pair (900, 299) chose, or was chosen, to be crossed. Now we must chose a (random) place in the genetic material to do the cross, so lets chose between the first and second digit. This splits the individuals into the pieces (9, 00) and (2, 99). We take the first part of the first individual (9) and the second part of the second individual (99) and put them together getting 999, and the first part of the second individual (2) and the second part of the first individual (00) which combine to 200. The new population is 999, 200, 132, and 900.

This might have all seemed pretty random and not very intelligent at all- the only thing that wasn't *completely* random was selection. Mutation and crossover could create terrible mutants that perform nowhere near as well as the originals just as it could possibly create good solutions. So why did we do this at all? Well, for one thing, in practice we would repeat the selection, mutation, and crossover many times on a much larger population so there would certainly be some chance or getting a perfect solution.

The other thing to notice is the first individual in the new population- its the perfect solution 999! It looks like we started with a small random population that didn't contain a perfect individual and randomly ended up with the correct solution! Its almost as if I had this in mind from the very beginning!

The most vital thing to notice here- really the point of this whole post- is that the original population didn't contain a perfect individual, and none of selection, mutation, or crossover by itself created the perfect solution. Selection ensured that the individual 900, which is a pretty good solution, appeared in the population more than once. Mutation introduced a pattern that was not in the original population (the 99 in 299, which we would write as #99 with # as a "don't care" symbol, which says that we are interested in the last two digits being 9s, and we don't care what the first digit is in this pattern). Crossover took two patterns that helped the fitness of the individuals they were in (#99 and 9## are the patterns that combined) and this produced the individual 999, which has both useful patterns.

This is the basic Genetic Algorithm. The three operations performed many times is the canonical example of an Evolutionary Algorithm and is considered the primal form of Genetic Algorithms. The problems can be much more complicated and useful, the genetic material can have many different symbols and different digits may be constrained to different values. The genetic material may be many numbers, like 934202 could be two inputs to a function f(x, y) = x*y where x=932 and y=202 in the example individual.

Hopefully Genetic Algorithms aren't so mysterious now, and we can move on to more interesting material. The basic Genetic Algorithm is surprisingly similar to my own work- Robust Gene Expression Programming- but we will have to introduce many new concepts to get there from here.

4 comments:

  1. These are really interesting posts. Keep it up.

    ReplyDelete
  2. Also- "Lets assume only the first pair (900, 299) chose, or was chosen, to be crossed" reminds me of Halflife 2. Kind of obscure reference, but I'm going to assume you'll get it.

    ReplyDelete
  3. Actually, that was exactly the reference I was making. Damn- we are amazing at referencing things.

    ReplyDelete