Wednesday, March 30, 2011

Fully Automated Prediction with Random Forests

Put the input vector down each of the trees in the forest. Each tree gives a classification, and we say the tree "votes" for that class. The forest chooses the classification having the most votes (over all the trees in the forest).
Basically Random Forests automatically generate many decision trees with mostly weak predictive goodness, and gain high predictive power by averaging them out. The algo can be sketched like this:
  1. randomly sample variables and predictors, repeat:
    1. identifying a predictor
    2. repeat down the tree
      1. seeking the most correlated variable
      2. make a binary decision out of it
  2. combine all predictions and average them out, voila!

Result: high predictive goodness sans parameters!

http://stat-www.berkeley.edu/users/breiman/RandomForests/

No comments: