交叉验证

zoukankan html css js c++ java

交叉验证

对于线性回归：
方法一：以前的cross validation中有一种方法是train/test split，现在挪到model_selection库中，randomly partition the data into training and test sets, by default, 25 percent of the data is assigned to the test set。这种方法只能得到一次划分结果的评估结果，不准确。

score算的是r-squared系数，好像score和cross_val_score默认算的就是r-squared系统

// from sklearn.model_selection import train_test_split
// X_train,X_test,y_train,y_test=train_test_split(X,y)
// model=LinearRegression()
// model.fit(X,y)
// model.score(X_test,y_test)

方法二：用model_selection库中的cross_val_score
// from sklearn.model_selection import cross_val_score
// model=LinearRegression()
// scores=cross_val_score(model,X,y,cv=5)

cv=5表示cross_val_score采用的是k-fold cross validation的方法，重复5次交叉验证

实际上，cross_val_score可以用的方法有很多，如kFold, leave-one-out, ShuffleSplit等，举例而言：

//cv=ShuffleSplit(n_splits=3,test_size=0.3,random_state=0)
//cross_val_score(model, X,y, cv=cv)

对于逻辑回归：
逻辑回归用于处理分类问题，线性回归求解how far it was from the decision boundary（求距离）的评估方式明显不适合分类问题。
The most common metrics are accuracy, precision, recall, F1 measure, true negatives, false positives and false negatives
1、计算confusion matrix
Confusion matrix 由 true positives, true negatives, false positives以及 false negatives组成。
// confusion_matrix=confusion_matrix(y_test, y_pred)
2、accuracy: measures a fraction of the classifier's predictions that are correct.
// accuracy_score(y_true,y_pred)
LogisticRegression.score() 默认使用accuracy
3、precision: 比如说我们预测得了cancer中实际确实得病的百分比
// classifier=LogisticRegression()
// classifier.fit(X_train,y_train)
// precisions= cross_val_score(classifier, X_train,y_train,cv=5,scoring='precision')
4、recall: 比如说实际得了cancer，被我们预测出来的百分比
// recalls= cross_val_score(classifier,X_train,y_train,cv=5,scoring='recall')
5、precision和recall之间是一个trade-off的关系，用F1score来表征性能，F1score越高越好
// fls=cross_val_score(classifier, X_train, y_train, cv=5,scoring='f1')
6、ROC曲线和AUC的值
ROC曲线的横坐标为false positive rate(FPR),纵坐标为true positive rate(TPR)
AUC数值=ROC曲线下的面积
// classifier=LogisticRegression()
// classifier.fit(X_train, y_train)
// predictions = classifier.predict_proba(X_test)
// false_positive_rate, recall, thresholds = roc_curve(y_test, predictions[:,1])
// roc_auc=auc(false_positive_rate, recall)

作者：dechuan
链接：https://www.jianshu.com/p/a4e94e72a46d
來源：简书
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

查看全文

相关阅读:
centos7安装nginx
linux经常使用的命令
 linux 安装 VNC
linux配置yum源
 docker服务器、以及容器设置自动启动
 docker初步学习以及常用命令
 Docker命令详解（run篇）
Linux scp命令
 Linux常用命令学习31个
 Linux下解压tar.xz文件

原文地址：https://www.cnblogs.com/zyy-/p/8530546.html

最新文章
18_init 函数的使用
 day-42mysql
day-41mysql
day-40mysql
day-39网络编程
 day-38网路编程
 day-37网路编程
 day-36并发编程
 day-35并发编程
 day34-并发编程

score算的是r-squared系数，好像score和cross_val_score默认算的就是r-squared系统

cv=5表示cross_val_score采用的是k-fold cross validation的方法，重复5次交叉验证

实际上，cross_val_score可以用的方法有很多，如kFold, leave-one-out, ShuffleSplit等，举例而言：