Random Forest Classification |
![]() |
Random Forest (RF) is a classification and regression tree technique
invented by Breiman[R-1]. A RF randomly and
iteratively samples the data and variables to generate a large group, or
forest, of classification and regression trees. The classification output from
RF represents the statistical mode of many decision trees achieving a more
robust model than a single classification tree produced by a single model run (Breiman[R-1]). Regression output from
RF represents the average of all the regression trees grown in parallel without
pruning. Three useful properties of RF are internal error estimates, the
ability to estimate variable importance, and the capacity to handle weak
explanatory variables. The iterative nature of RF affords it a distinct
advantage over the other methods as this effectively bootstraps (by feeding
random subsets of training data) the data for more robust predictions. This
helps in reducing correlation between trees. Random subsets of predictor
variables allow derivations of variable importance measures and prevent problems
associated with correlated variables and over fitting (Breiman[R-1]).
[R-1]
Breiman, L. 2001. Random forests. Machine Learning 45:
5-32.