1、聚类模型
from sklearn.cluster import Kmeans
2、数据集
from sklearn.datasets import load_iris
sklearn标准数据结构
data = [[feature1,feature2,feature3]*nsample]
target = [0,2,,1,2,1,2,0...]
3、特征选择 用于筛选特征
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
fs = SelectKBest(chi2,k=10)
4、预处理
from sklearn.preprocessing import LabelEncoder, LabelBinarizer
Binarizer
4、模型评估、选择
from sklearn.model_selection import KFold
5、模型评估
from sklearn import metrics
y_pred = [0,2,1,3]
y_true = [0,1,2,3]
metrics.accuracy_score(y_true, y_pred)
0.5
metrics.accuracy_score(y_true, y_pred,normalize=False)
roc_auc_score(Receiver Operating Characteristics(受试者工作特性曲线,也就是说在不同的阈值下,True Positive Rate和False Positive Rate的变化情况))
auc就是曲线下面积,这个数值越高,则分类器越优秀
https://zhuanlan.zhihu.com/p/100059009
https://www.zhihu.com/question/39840928
6、朴素贝叶斯
sklearn.naive_bayes
7、邻近算法
sklearn.neighbors
8、sklearn.svm 支持向量机
9、sklearn.tree 决策树