zoukankan      html  css  js  c++  java
  • PCA、KNN和GridSearchCV

    PCA

    PCA主要是用来数据降维,将高纬度的特征映射到低维度,具体可学习线性代数。

    这里,我们使用sklearn中的PCA.

    from sklearn.decomposition import PCA
    
    X = np.array([[-1, -1, 1, -3], [-2, -1, 1, -3], [-3, -2, 1, -3], [1, 1, 1, -3], [2, 1, 1, -3], [3, 2, -1, -3]])
    pca = PCA(n_components=4)
    pca.fit(X)
    print(pca.explained_variance_ratio_) #各成分百分比
    print(pca.explained_variance_)    #各成分值
    
    pca = PCA(n_components=1)   #原来是4维,现在降至1维
    XX = pca.fit_transform(X)
    print(XX)

    结果:

    [0.94789175 0.04522847 0.00687978 0.        ]
    [8.21506183 0.39198011 0.05962472 0.        ]
    [[-1.42149543]
     [-2.2448796 ]
     [-3.60382274]
     [ 1.29639085]
     [ 2.11977502]
     [ 3.85403189]]

    其实,直接看数据也能发现。例如,最后一维没变化所以百分比为0,倒数第二维只有一点点变化所以百分比也很小,它们对结果的影响很小,在降维时可以去掉。

    KNN

    所谓K最近邻,就是k个最近的邻居的意思,说的是每个样本都可以用它最接近的k个邻居来代表。
    kNN算法的核心思想是如果一个样本在特征空间中的k个最相邻的样本中的大多数属于某一个类别,则该样本也属于这个类别,并具有这个类别上样本的特性。
    流程如下:
    1. 计算出样本数据和待分类数据的距离;
    2. 为待分类数据选择K个与其距离最小的样本;
    3. 统计出K个样本中大多数样本所属的分类;
    4. 这个分类就是待分类数据所属的分类。
    classifier = KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                metric_params=None, n_jobs=1, n_neighbors=10, p=2,
                weights='uniform')

    超参数需要自己尝试。

    其他

    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.linear_model import LogisticRegression
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.svm import SVC
    from sklearn.naive_bayes import GaussianNB
    from sklearn import metrics
    
    classifier = LogisticRegression(random_state = 0)  #0.78
    #classifier = KNeighborsClassifier(algorithm='kd_tree',n_neighbors = 5, metric = 'minkowski', p = 2, weights='uniform')  #0.839
    #classifier = SVC(kernel = 'linear', random_state = 0)  #0.81
    #classifier = SVC(kernel = 'rbf', random_state = 0)  #0.77
    #classifier = GaussianNB()   #0.77
    #classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)  #0.64
    #classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)  #0.83
    classifier.fit(X_stard, Y_stard)
    YY_pred = classifier.predict(X_pred)
    result_NMI=metrics.normalized_mutual_info_score(YY_pred, Y_pred)
    print("result_NMI:",result_NMI)  #3,1,minkowski   3,1,manhattan

    GridSearchCV寻找超参数

    sklearn调参有一个工具gridsearchcv,它存在的意义就是自动调参,只要把参数输进去,就可以对算法进行相应的调优,找到合适的参数。

    ### KNN
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.model_selection import GridSearchCV
    
    clf = KNeighborsClassifier()
    n_neighbors = list(range(1,10))
    weights = ['uniform','distance']
    algorithm_options = ['auto','ball_tree','kd_tree','brute']
    leaf_range = list(range(1,10))
    p = list(range(1,10))
    param_grid = [{'n_neighbors': n_neighbors, 'weights': weights, 'algorithm': algorithm_options, 'leaf_size': leaf_range, 'p':p}]
    grid_search = GridSearchCV(clf, param_grid=param_grid, cv=10)
    grid_search.fit(X_pred, Y_pred)
    grid_search.best_score_, grid_search.best_estimator_, grid_search.best_params_

    结果:

    (0.9675572519083969,
     KNeighborsClassifier(algorithm='auto', leaf_size=1, metric='minkowski',
                          metric_params=None, n_jobs=None, n_neighbors=7, p=2,
                          weights='uniform'),
     {'algorithm': 'auto',
      'leaf_size': 1,
      'n_neighbors': 7,
      'p': 2,
      'weights': 'uniform'})

    参考链接:

    1. https://blog.csdn.net/puredreammer/article/details/52255025

    2. https://www.makcyun.top/2019/06/15/Machine_learning08.html

    3. https://blog.csdn.net/szj_huhu/article/details/74909773

    4. https://www.zybuluo.com/spiritnotes/note/295894

  • 相关阅读:
    阅读 video in to axi4-stream v4.0 笔记
    python 字符串操作
    python 基本语句
    Python 算术运算符
    芯片企业研报阅读
    量化分析v1
    基于MATLAB System Generator 搭建Display Enhancement模型
    System Generator 生成IP核在Vivado中进行调用
    FPGA 中三角函数的实现
    System Generator 使用离散资源
  • 原文地址:https://www.cnblogs.com/lfri/p/11773286.html
Copyright © 2011-2022 走看看