Wednesday, November 9, 2011

Neural Networks

The perceptron by itself can be useful. They can be used for simple problems, as simpler classifiers to produce features for more advanced ones, or as part of a boosting algorithm that makes use of a group of perceptrons (adaboost for example). A really cool thing you can do with them is to connect them up with each other.

There are many ways to do this, with the simplest being the feed forward topology where we have several groups where each group is connected to each neuron in the next, with one group the input and other the output. Activation flows from the input to the output, with no loops, which is why its called feed forward. We can connect them all to each other in a mesh, or Each perceptron takes n inputs and produces 1 output.

If we have m input values, and we place each of these m on the inputs of k neurons, we will get k outputs (one from each neuron). Connecting each of these k outputs to n (with n the number of outputs desired) neurons, then we have a feed forward neural network with one so-called "hidden" layer. The hidden layer is very very important- have one will allow us to solve almost all problems (in principal) (as long as we chose a good activation function (which I'll get to later)). Having two hidden layers allows us to approximate and function, which is one of the crazy things about neural networks. We aren't in general going to be able to find the best network for a given problem, but its nice to know that it exists somewhere in the neural-network-space.

For this network to be able to approximate nicely we need to add one other thing. The outputs of neural will be the sum of the products of each input and the weight for that input fed into some function. This function has be any real valued function that you like, but certain ones are better then others. If we chose a non-linear function then we get the universal approximating behavior I mentioned earlier. Commonly a sigmoid function is chosen as it is something like a neuron firing after a sufficiently high input. To get a complete on/off activation we can use whats called a step function, which goes from, for example, 0 to 1 at a certain point (the activation point).

With these new ideas, a topology of neurons and an activation function, we have a nice new classifier. But the classifier itself isn't enough- we must be able to train it on a data set. In the next post I will describe a simple training mechanism called back propagation, and hopefully get into some other the other ways to do training.

No comments:

Post a Comment