Thursday, November 10, 2011

Neural Networks- Training

A Neural Network with n input neurons, k neurons in the hidden layer, and o neurons in the output layer can be considered a pair of matrices- one nxk matrix and one kxo matrix. These matrices contain the weights on each neurons inputs and are initially randomly generated. We multiple the input vector by the hidden neuron weights, apply the activation function to each, and then multiply the result by the output neuron weights, again applying the activation function, to get the resulting vector.

So we have these random weights on the inputs of our neurons, and we can calculate an output from them. We got the inputs from some problem, and they describe some function we want to approximate, some system we want to control (inputs as measurements of the system, output describing some action to take), some decision we want to make (which can be done in several ways), etc. Given that the weights are random they are pretty much guaranteed to not solve any particular problem we have (or at least, they won't give a very good solution). This means we need some way of adjusting the weights based on the output we got and the output we expected for the input. We know the correct output only for some inputs (the ones we have collected to train the network with) and we want a network that, given the inputs we have, give close or exactly the outputs we collected, and for inputs we don't have, will give close or exactly the outputs we want.

One way to do this is a "gradient descent" algorithm, which basically moves the network in the direction of the greatest improvement, like being on the side of a hill and taking a step in the steepest direction (the direction that will change your height the most). This will not necessarily find the best network (it won't in general) but it is relatively simple. The basic idea (using the epoch approach) is to sum the errors on each neuron for the whole training set, and to "propagate" this error backwards through the network (from output neurons to hidden layer neurons to input layer neurons). This is done by adjusting the weights of each input to each neuron by adding to the previous weight the sum of the squared difference between the actual and the desired outputs, multiplied by itself put through the derivative of the activation function and scaled by a fixed value between 0 and 1 called the learning rate (set at the start of the run). In symbols w(t+1) = w(t) + a*e*f'(e) with w(t) the weight at time t, a the learning rate, e the error, and f' the derivative of the activation function.

There are many other ways to do training. One particular technique that I think is pretty cool is to use a Genetic Algorithm. The individuals are vectors of weights, one for each input for each neuron. The fitness of an individual is simply the sum of the squared difference between actual and desired inputs (possibly summing up the errors of all neurons to produce a simple fitness value, or using some problem specific way of determining error). This technique has the advantages and disadvantages of Evolutionary Algorithms in general- it is very general, it won't make assumptions about the solution space, and it can potentially find the global maximum. On the other hand, it may take a long time.

No comments:

Post a Comment