zoukankan html css js c++ java

网格搜索与交叉验证

一. 网格搜索验证

sklearn.model_selection.GridSearchCV(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch=‘2*n_jobs’, error_score=’raise’, return_train_score=’warn’)

2. 常用方法和属性

grid.fit()：运行网格搜索
best_params_：描述了已取得最佳结果的参数的组合
best_score_：提供优化过程期间观察到的最好的评分
feature_importances_：提供所有特征重要程度的分数

3. 使用示例(以RandomForestClassifier为例, 其它的分类模型也能按这个方法调参)

1. 先寻找最优RF的n_estimators参数

1 param_test1 = {'n_estimators':[50,120,160,200,250]}
2 gsearch1 = GridSearchCV(estimator = RandomForestClassifier(min_samples_split=100,
3 min_samples_leaf=20,max_depth=8,max_features='sqrt' ,random_state=10), 
4 param_grid = param_test1, scoring='roc_auc',cv=5)
5 gsearch1.fit(x_train,y_train)
6 print( gsearch1.best_params_, gsearch1.best_score_) # 得到最优n_estimators参数

2. 接着寻找最优决策树最大深度max_depth

1 param_test2 = {'max_depth':[1,2,3,5,7,9,11,13]}#, 'min_samples_split':[100,120,150,180,200,300]}
2 gsearch2 = GridSearchCV(estimator = RandomForestClassifier(n_estimators=50, min_samples_split=100,
3 min_samples_leaf=20,max_features='sqrt' ,oob_score=True, random_state=10),
4 param_grid = param_test2, scoring='roc_auc',iid=False, cv=5)
5 gsearch2.fit(x_train,y_train)
6 print( gsearch2.best_params_, gsearch2.best_score_)    # 得到最优max_depth参数

3. 对于RF分类器, 可以看看现在模型的袋外分数

1 rf1 = RandomForestClassifier(n_estimators= 50, max_depth=2, min_samples_split=100, 
2 min_samples_leaf=20,max_features='sqrt',oob_score=True, random_state=10)
3 rf1.fit(x_train,y_train)
4 print( rf1.oob_score_) # 打印袋外分数
#假设输出结果为0.984, 默认情况为0.972
#相对于默认情况,袋外分数有提高，也就是说模型的泛化能力变好了

4. 继续如此循环调整可以得到最优参数组合

二. 交叉验证

示例

 1 from sklearn.neighbors import KNeighborsClassifier
 2 from sklearn.model_selection import cross_val_score
 3 
 4 k_range = [1, 5, 9, 15]
 5 cv_scores = []
 6 for k in k_range:
 7 knn = KNeighborsClassifier(n_neighbors=k)
 8 scores = cross_val_score(knn, X_train, y_train, cv=5)
 9 cv_score = np.mean(scores)
10 print('k={}，验证集上的准确率={:.3f}'.format(k, cv_score))
11 cv_scores.append(cv_score)
12 # k=1，验证集上的准确率=0.947
13 # k=5，验证集上的准确率=0.955
14 # k=9，验证集上的准确率=0.964
15 # k=15，验证集上的准确率=0.964
16 
17 best_k = k_range[np.argmax(cv_scores)] # 从交叉验证中的最优score中取出最优参数, 代入模型重新fit,score
18 best_knn = KNeighborsClassifier(n_neighbors=best_k)
19 best_knn.fit(X_train, y_train)
20 print('测试集准确率：', best_knn.score(X_test, y_test))
21 # 测试集准确率： 0.9736842105263158

查看全文

相关阅读:
《WF编程》系列之32 基本活动:条件与规则 4.5 条件与规则
 《WF编程》系列之31 基本活动:事务(Transactions)与补偿(Compensation) 4.4 事务(Transactions)与补偿(Compensation)
《WF编程》系列之30 基本活动:错误处理
 《WF编程》系列之33 基本活动:Web Services 4.6 Web Services
《WF编程》系列之36 自定义活动:如何创建自定义活动?活动的组合 5.2 如何创建自定义活动?
《WF编程》系列之35 自定义活动:为何创建自定义活动? 5 自定义活动
 《WF编程》系列之29 本地通信事件:HandleExternalEventActivity & 活动生成器 4.2.2 HandleExternalEventActivity
《WF编程》系列之34 基本活动:状态活动到目前为止,我们所讨论的工作流都是顺序工作流,而WF还支持另外一种工作流机制状态机(StateMachine)工作流,本节就来介绍这些在状态机工作流中工作的活动.
《WF编程》系列之37 打开黑盒子:属性升级.
《WF编程》系列之28 本地通信事件:CallExternalMethodActivity

原文地址：https://www.cnblogs.com/Alexisbusyblog/p/12403381.html