Sunday, February 27, 2011

Binary Decision Trees

I posted before about decision tree induction in GEP. Unfortunately there is not much data to compare GEP's ability to do DTs as the original author made some changes to GEP to do her experiments that I don't want to do to RGEP. Instead I'm trying to replicate results from a GA called GATree, which is pretty much a Genetic Programming technique, where the authors create Binary Decision Trees.
Binary Decision Trees (BDTs) are like normal decision trees in that each internal node is a decision and each leaf is one of the possible classes for the decision variable. The difference is that the internal nodes always have two children (hence the name). Rather than choosing the sub-child corresponding to the value of the attribute associated with the node, the node is associated with a single possible value for a single attribute. These values can be names or ranges. The left sub-child is taken if the current piece of data being decided on has the given value for the given attribute and the right sub-child is taken if not.
It is very easy to model this situation in Haskell. Here is the algebraic data type for the trees:
data BDT a = Leaf a | Node a (BDT a) (BDT a) deriving (Show, Eq, Functor, Foldable, Traversable)

and the evaluator:

evaltree _ (Leaf a) = a
evaltree attrs (Node a l r) = evaltree attrs $ if a `elem` attrs then l else r

Where attrs is a list of values of type a. The result is a value of type a that the tree has giving as the class of the list of as. Notice that all nodes and leafs hold the name type- since we are doing only nominal attributes even ranges of integers can be described by a string, which is what I'm actually using.
You have to use the following pragma to get all those beautiful derives:
{-# LANGUAGE DeriveTraversable, DeriveFoldable, DeriveFunctor #-}

The last important bit here is the ability to get the number of training cases correctly classified:

evalcases tree = sum . map (\(cas, expected) -> fromEnum $ evaltree cas tree == expected)

In case this is confusing, notice that it is in point free style- it evaluates to a function that expects a list of training cases which are pairs of attribute lists and their correct class. The fromEnum just turns a boolean value into an Int.

I'm also going to look into using functions as binary classifiers to test the results in the GEP book a little and see if I can include them in my thesis.

No comments:

Post a Comment