Random Forest Classification

Supervised Classification

Random Forest (RF) is a classification and regression tree technique invented by Breiman[R-1]. A RF randomly and iteratively samples the data and variables to generate a large group, or forest, of classification and regression trees. The classification output from RF represents the statistical mode of many decision trees achieving a more robust model than a single classification tree produced by a single model run (Breiman[R-1]). Regression output from RF represents the average of all the regression trees grown in parallel without pruning. Three useful properties of RF are internal error estimates, the ability to estimate variable importance, and the capacity to handle weak explanatory variables. The iterative nature of RF affords it a distinct advantage over the other methods as this effectively bootstraps (by feeding random subsets of training data) the data for more robust predictions. This helps in reducing correlation between trees. Random subsets of predictor variables allow derivations of variable importance measures and prevent problems associated with correlated variables and over fitting (Breiman[R-1]).


https://en.wikipedia.org/wiki/Random_forest

[R-1]        Breiman, L. 2001. Random forests. Machine Learning 45: 5-32.

[R-2]        Breiman, L., Cutler, A., Liaw, A. and Wiener, M. 2006. Breiman and Cutler's Random forests for classification and regression. CRAN Project.