Friday, January 11, 2013

Radial Basis Function Neural Networks

I've been working on my genetic algorithm library lately, and after getting RGEP to work again I've moved on to evolving neural networks. Since RGEP is about evolving tree structures, one easy test for it is to evolve equations to match data (symbolic regression). I wanted to see how a neural network would do on the same problem if we evolved its weights using a genetic algorithm, but the usual neural networks aren't terribly good at this kind of problem since their outputs are usually either in [0..1] or [-1..1]. This led me to Radial Basis Function (RBF) Neural Networks, which are a pretty straightforward modification to the usual feed-forward networks with, say, sigmoid activation functions. They have the same feed-forward architecture, but they are much easier to use for function finding.

The main thing to note about an RBF network is that the activation function is whats called a radial basis function, which I will describe below. Other things to note are that the architecture for these networks usually don't have an activation function for the output neurons (in other word, the activation function is the identify function) and there are no weights for the inputs (in other words, the weights are all one). This means that the network is a simple linear combination of the hidden neurons. I'm not sure there is a disadvantage to structuring them differently, except perhaps it will make training harder (as there are more values to learn). Also, its seems like having more then two layer (deep neural networks) can be very advantageous to neural networks, although it makes them harder to train (especially with back-propagation, which apparently has trouble propagating error through multiple layers).

So- radial basis functions. These are functions that have a "center", and whose value depends only one the distance from this point. Note that we could use any distance function we wanted, but euclidean distance seems to be the most popular. The center for a neuron is a vector, which replaces the usual weights. This means that each hidden layer neuron will contain an n-dimensional vector defining a point in n-dimensional space. Given the input vector to the neuron, we first compute the distance between the neuron's point and the point defined by the input. This gives a single, scalar, number, which is then plugged into some equation such as the Gaussian function e^(-beta * r^2) where r is the input and beta is a parameter to the algorithm. Other functions are available.

I found this applet useful to get some intuition about these networks: http://lcn.epfl.ch/tutorial/english/rbf/html/index.html. I like how the introduction of a difference kind of activation function extends neural networks into a difference problem type, and I think these are a nice thing to have in owns toolkit.

No comments:

Post a Comment