zoukankan      html  css  js  c++  java
  • KNN分类

    1. KNN简介

        K近邻(K-Nearest Neighbor)简称KNN.它可以做分类算法,也可以做回归算法。个人经验:KNN在做分类问题时非常有效。

    2. KNN算法思想

        在样本空间中,我们认为两个实例在特征空间中的距离反映了它们之间的相似度,距离越近越相似。输入一个实例,看它距离些实例近,使用这些实例标签推断该实例标签(一般使用投票法做分类)。

    3. KNN算法实现

    # 导入包
    import pandas as pd
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.model_selection import GridSearchCV
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import accuracy_score, classification_report
    import joblib
    
    # 导入数据
    fpath = r"..文件训练数据2.csv"
    df = pd.read_csv(fpath)
    print(df.head())
    
    
    # 数据划分
    x_train, x_test = train_test_split(df, train_size=0.7)
    
    # 训练集
    train_x = x_train.loc[:, "nAcid":"Zagreb"]
    train_y = x_train["CYP3A4"]
    
    # 测试集
    text_x = x_test.loc[:, "nAcid":"Zagreb"]
    test_y = x_test["CYP3A4"]
    
    # 训练knn模型
    knn = KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto')
    knn.fit(train_x, train_y)
    joblib.dump(knn, "knn2.pkl")
    
    scores = knn.score(train_x, train_y)
    print("knn训练得分:", scores)
    
    # 测试模型
    label_predic = knn.predict(text_x)
    acc = accuracy_score(label_predic, test_y)
    print("knn测试得分:", acc)
    
    print(classification_report(test_y, label_predic))
    
    
    # 网格调参
    gsCv = GridSearchCV(knn,
                        param_grid={
                         'n_neighbors':list(range(1, 40, 1))
                         }, cv=10)
    gsCv.fit(train_x, train_y)
    
    print("参数训练结束")
    print("参数训练结束")
    print("最好的得分:", gsCv.best_score_, "最好的参数:", gsCv.best_params_)
    
  • 相关阅读:
    Duff and Meat(贪心)
    Duff and Meat(贪心)
    Eugeny and Array(水题,注意题目描述即可)
    Eugeny and Array(水题,注意题目描述即可)
    HDU-2588-GCD (欧拉函数)
    HDU-2588-GCD (欧拉函数)
    再谈欧拉函数
    再谈欧拉函数
    容斥定理及浅略介绍
    Vue
  • 原文地址:https://www.cnblogs.com/mysterygust/p/15426602.html
Copyright © 2011-2022 走看看