Tuesday, June 22, 2010

Modeling Substructures

The number of substructures in a tree is always <= the number of schema of its linear representation. In all but the most trivial case, it is much less.
This is interesting because since the original schema theory doesn't make so much sense for GEP, it also appears that GEP is sampling fewer schema. This is exacerbated by the fact that a substructure that appears in two forms or in different places could be considered the same structure (because we do not could the context of the structure as being part of it, only the structure itself).

There are a couple of things to note here though. One is that we are not searching the normal solution space, but rather the space of strings whose trees map to the space, so there are a lot of complications in relating the spaces directly and saying that we are searching one more or less efficiently than a GA might.

Also, the idea behind substructures is to sample them in different places and hope their effects are overall positive. In other words, while there are fewer of them, they can each have many effects in both their part of the tree and (if they are "open") the trees that complete them.

Overall they are really very different from the schemata of GAs. They have only the properties inherent in the idea of classifying parts of a string. They have different dynamics, meaning, and structure. They share other things, like the ideas of defining length, but I'm not sure if they should share notation or not- as I would like to be able to capture the actual vs smallest size of a substructure somehow.
An example of that btw is +-##1 could also be +1-## with no change in the resulting value. Are these the same? They have the same meaning (since + is commutative), but different defining lengths. And really the # could be of any size, and still match the 1 with the plus, like
+-####1 could be +-*23451. Maybe the # denote any subtree? This complicates the defining length, but would mean that I would probably define structures that look different as different, just so I could give them a length of some sort without resorting to a vague "minimal length when completed with minimal subtrees" or something.

All of this reminds me of my idea to introduce a "defining length reduction" operator. I'm still not sure if there is such a thing, but it might help, and hopefully one day we will have a model that would predict the effect of such a thing.

No comments:

Post a Comment