Motivation
Random Forest is some kind of ensemble learning method, just as its name indicates. The base learner is decision tree and RF uses Bagging to integrate. The difference mainly lies in the word 'Random'. The original decision tree selects the best attribute while RF has two steps to select a split attribute for each base learner:
- Select k attributes from (A) randomly;
- Select the best attribute from the k attributes. If k=1 then it's total random selection; If k=(|A|) then it's the same as decision tree. The recommended is (k=log_2|A|).
The intuition behind this is to increase the diversity of the base learners. In original bagging method we just use the sample disturbance of the training data. But in RF we add the attribute disturbance thus making the learner generalize very well.
As you can imagine, RF's performance is worse than Bagging during the early time of the training process. Since we just use a subset of the attributes so the base learner performs not that well. But with the increase of the number of base learners, it will gradually use the whole information and then has a low generation error. By the way, RF is often faster than Bagging since we only use a subset to train the base learners.