K折交叉验证(K-fold cross-validation): 将样本分成K份,每份数量大致相等,然后用其他的某一份作为测试,其他样本作为训练集,得到一个模型和一组预测值及模型评估值;循环这个过程K次,得到K组模型评估值,对其取平均值即得到最终的评估结果
from sklearn.model_selection import cross_val_score
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, iris.data, iris.target, cv=5)
scores
>>>
#5次交叉验证的得分
array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
#这种数据切分方式可以打乱顺序
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import ShuffleSplit
cv = ShuffleSplit(n_splits=3, test_size=0.3, random_state=0)
cross_val_score(clf, iris.data, iris.target, cv=cv)
通过交叉验证获取预测
from sklearn import metrics
rom sklearn.model_selection import cross_val_predict
predicted = cross_val_predict(clf, iris.data, iris.target, cv=10)
metrics.accuracy_score(iris.target, predicted)
>>>
0.966