Skip to main content

A Data Complexity Analysis of Comparative Advantages of Decision Forest Construction

01 June 2002

New Image

Using a number of measures for characterization of complexity of classification problems, we studied the comparative advantages of two methods for constructing decision forests - boot-strapping and random subspaces. We investigated a collection of 392 two-class problems from the UCI depository. We observed that there are strong correlations between the classifier accuracies and a measure of length of class boundaries as well as a measure of the thickness of the class manifolds. Also, the bootstrapping method is better when the training samples are sparse and subspace method is better when the classes are compact and the boundaries are smooth.