Monday, June 18, 2012

Accuracy Based Learning Classifier Systems

I have always found learning classifier systems to be extremely interesting. A LCS is an agent that is able classify its environment (say, through sensor input, or given problem instances) and decide on an action based on a set of rules. The crazy thing about its decision making is that it arises from a collection of competing rules evolved by a Genetic Algorithm- a collective but singular mind. Even better than a LCS is an Accuracy Based Learning Classifier Systems (XCS), which attempts not just to find the actions that give the highest reward from its environment, but to find rules that accurately predict the actual reward. The nice thing here is that while in both systems you end up with a collection of rules that can be used as a classifier, with an XCS you end up with a collection of rules that map out the reward space. Either way, the result is the whole population (there are variations on this, but I'm not going to go into that), so we get a swarm of weak classifiers that collective can solve problems. The individuals compete with each other, but they also share rewards.

I won't go into huge detail on describing these systems- there is no single part that is terribly complicated, but they are altogether more complex then some of the other systems. If nothing else, they contain a GA component, and so they are more complex than a GA by itself. I will give a brief overview of XCS (for me, the more interesting of the two), as much for myself as for the reader. The description follows the very good explanation in the paper "Tournament Selection in XCS".

A XCS has some input that it must learn to classify using its internal rules (individuals), which consist of a classifier (some pattern to match to the input to determine if the individual has something to say about the input). Each individual must also have an action to perform if their classifier does match an input, as well as a predicted reward for the action along with an error in that reward prediction. They also keep a fitness value with them, which goes up and down depending on their ability to accurately predict high-reward actions.

A simple case of input is a fixed length bit vector, so that will be the example. The input is an element of {0,1}^L, bit vectors of length L, which are matched against classifiers in {0, 1, #}^L. If you are not familiar with this notations, the # is commonly used as a "don't care" symbols, matching anything it is compared to. So if we imagine that the system get an input 10101, and the individual has a classifier like 10101, 10###1, 1#####, ##101, or something like that, then the individual matches the input. We collect all individuals matching the input into the matching set of individuals, |M|. There will be some set of actions advocated by individuals in |M|, call it A, and we chose the action to take by the greatest fitness weighted average of predicted payoff for each action in A. That gets us to the action set |A|, the set of individuals that advocate (have as their genetic material the action) that has the highest average payoff weighted by the individuals fitness' that advocate that action. Having now chosen an action, the action is taken and the reward is reaped from the environment (the action may be a classification of input, and the reward an indication of how close the classification was to the correct one).

Having received a reward, the classifiers that advocated the reward must it. In addition to merely being rewarded, they are modified by how accurate their predictions of the reward was, and their error is modified if the reward was outside their prediction +- their error and beyond some predetermined threshold for error tolerance. I will not describe these updates- just look at the paper. <>A GA is then evoked for individuals that were in the action set and have not been part of the GA for a certain amount of time. This is one of the many parameters to this algorithm, and the time since an individual last took part in a GA is one of several extra values that are kept track of in individuals (also an experience and numerosity count- again, just read the paper). The individuals are selected from this subset of the action set by some selection mechanism (tournament being advocated in the paper), and mutated and crossed as usual. It seems like uniform crossover is favored here.

Interestingly, after the GA a "subsumption" operator is applied to delete individuals that are completely subsummed by others in the action set, are experience (have had their parameters adjusted a certain number of times) enough to be deleted, and are less accurate then the subsumping individual.

So there you go! XCS! Very cool algorithm, somewhat complex, lots to keep track of, but conceptually simple enough and a very awesome idea. I have never programmed one, but given that I'm making lots of good progress on my evolutionary algorithm framework, I may give it a try.

No comments:

Post a Comment