在模型选择中我们一般用caret包train函数建立模型,并对模型进行评判
方法1:
set.seed(1234) tr_control<-trainControl(method = 'cv',number = 5) # 创建随机森林模型 model_rf<-train(Class~.,data=traindata, trControl=tr_control,method='rf') model_rf
输出
mtry Accuracy Kappa
2 0.9276465 0.8552977
16 0.9314521 0.8628921
30 0.9276627 0.8553120
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 16.
方法2
set.seed(1234) model_rf <- train(Class ~., data = traindata, method = 'rf', trControl = trainControl(method = 'cv', number = 5, selectionFunction = 'oneSE')) model_rf
mtry Accuracy Kappa
2 0.9276143 0.8552365
16 0.9212771 0.8425685
30 0.9250988 0.8502003
Accuracy was used to select the optimal model using the one SE rule.
The final value used for the model was mtry = 2.
可以看到二者选定的模型并不一样,而且选定的标准也不一样,方法1标准是最大值法,方法2是精确度。
原因在方法2中用了:selectionFunction = 'oneSE'