Sklearn sklearn是基于numpy和scipy的一个机器学习算法库,设计的非常优雅,它让我们能够使用同样的接口来实现所有不同的算法调用。本文首先介绍下sklearn内的模块组织和算法类的顶层设计图。
Sklearn 使用 在sklearn里面,我们可以使用完全一样的接口来实现不同的机器学习算法,通俗的流程可以理解如下:
1、数据加载和预处理 2、定义分类器(回归器等等) 3、用训练集对模型进行训练,只需调用fit方法 4、用训练好的模型进行预测 5、对模型进行性能评估
加载数据 1 2 3 4 5 6 from sklearn import datasetsfrom sklearn import metricsdataset=datasets.make_classification(n_samples=1000 ,n_features=10 ,n_informative=2 ,n_redundant=2 ,n_repeated=0 ,n_classes=2 ) print(dataset[0 ]) print(dataset[1 ]) print('n' )
分类数据 1 2 3 4 5 6 7 8 9 from sklearn import cross_validationkf=cross_validation.KFold(len(dataset[0 ]),n_folds=10 ,shuffle=True ) for train_index,test_index in kf: x_train,y_train=dataset[0 ][train_index],dataset[1 ][train_index] x_test,y_test=dataset[0 ][test_index],dataset[1 ][test_index] print(x_train,'n' ) print(y_train,'n' ) print(x_test,'n' ) print(y_test,'n' )
GaussianNB 模型训练与预测与评估 1 2 3 4 5 6 7 8 9 10 11 12 13 14 from sklearn.naive_bayes import GaussianNBclf = GaussianNB() clf.fit(x_train, y_train) pred = clf.predict(x_test) print (pred)print( y_test) acc = metrics.accuracy_score(y_test, pred) print ("Accuracy:" ,acc)f1 = metrics.f1_score(y_test, pred) print ("F1-score:" ,f1)auc = metrics.roc_auc_score(y_test, pred) print ("AUC ROC:" ,auc)print("n" )
结果 1 2 3 4 5 6 7 8 9 10 GaussianNB [1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0] [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] Accuracy: 0.84 F1-score: 0.8518518518518519 AUC ROC: 0.8445512820512822
SVC 模型训练与预测与评估 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from sklearn.svm import SVCprint("SVC" ) C_values = [1e-02 , 1e-01 , 1e00 , 1e01 , 1e02 ] for Cs in C_values: clf = SVC(C=Cs, kernel='rbf' , gamma=0.1 ) clf.fit(x_train, y_train) pred = clf.predict(x_test) print (pred) print( y_test) acc = metrics.accuracy_score(y_test, pred) print ("Accuracy:" ,acc) f1 = metrics.f1_score(y_test, pred) print ("F1-score:" ,f1) auc = metrics.roc_auc_score(y_test, pred) print ("AUC ROC:" ,auc) print("n" )
结果 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 大专栏 Sklearn ">37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 SVC [1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0] [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] Accuracy: 0.82 F1-score: 0.8363636363636364 AUC ROC: 0.8253205128205129 [1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0] [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] Accuracy: 0.84 F1-score: 0.8518518518518519 AUC ROC: 0.8445512820512822 [1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0] [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] Accuracy: 0.85 F1-score: 0.8598130841121496 AUC ROC: 0.8541666666666666 [1 1 0 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] Accuracy: 0.85 F1-score: 0.854368932038835 AUC ROC: 0.8525641025641024 [1 1 0 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 0 1 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 0] [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] Accuracy: 0.84 F1-score: 0.8400000000000001 AUC ROC: 0.8413461538461539
Random Forest 模型训练与预测与评估 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from sklearn.ensemble import RandomForestClassifierprint("Random Forest" ) n_estimators=[10 , 100 , 1000 ] for value in n_estimators: clf = RandomForestClassifier(n_estimators=value) clf.fit(x_train, y_train) pred = clf.predict(x_test) print (pred) print( y_test) acc = metrics.accuracy_score(y_test, pred) print ("Accuracy:" ,acc) f1 = metrics.f1_score(y_test, pred) print ("F1-score:" ,f1) auc = metrics.roc_auc_score(y_test, pred) print ("AUC ROC:" ,auc) print("n" )
结果 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Random Forest [1 1 1 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 0 0 0 1 0 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0] [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] Accuracy: 0.9 F1-score: 0.9 AUC ROC: 0.9014423076923077 [1 1 1 1 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] Accuracy: 0.9 F1-score: 0.9019607843137256 AUC ROC: 0.9022435897435898 [1 1 1 1 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0] Accuracy: 0.9 F1-score: 0.9019607843137256 AUC ROC: 0.9022435897435898