1 clf = tree.DecisionTreeClassifier()
2
3 '''
4 5 GridSearchCV search the best params
6 '''
7 pipeline = Pipeline([('tree', clf), ("svm", svm)])
8
9
10 11 param_test = dict(tree__min_samples_leaf=range(5, 16, 2), tree__criterion=["gini","entropy"],svm__C=[0.1, 1, 10])
12 gsearch2 = GridSearchCV(pipeline,param_grid=param_test, scoring="accuracy", n_jobs=2, cv=5)
13 gsearch2.fit(np.array(x_train), np.array(y_train))
14 print(gsearch2.best_estimator_)
pipeline 联合estimator,使其使用一个fit,简化代码。
命名规则:
pipeline = Pipeline([('tree', clf), ("svm", svm)])
param_test = dict(tree__min_samples_leaf=range(5, 16, 2), tree__criterion=["gini","entropy"],svm__C=[0.1, 1, 10])
'tree'(自己设定的名称)通过“__”连接estimator的参数(min_samples_leaf),range代表取值范围。
例如,min_samples_leaf为决策树里面的一个参数设置,tree.DecisionTreeClassifier(min_samples_leaf=?)
pipeline的流程在次不做介绍。