zoukankan      html  css  js  c++  java
  • <机器学习实战>读书笔记--k邻近算法KNN

    k邻近算法的伪代码:

      对未知类别属性的数据集中的每个点一次执行以下操作:

      (1)计算已知类别数据集中的点与当前点之间的距离;

      (2)按照距离递增次序排列

      (3)选取与当前点距离最小的k个点

      (4)确定前k个点所在类别的出现频率

      (5)返回前k个点出现频率最好的类别作为当前点的预测分类

    python函数实现

    '''
    Created on Sep 16, 2010
    kNN: k Nearest Neighbors
    
    Input:      inX: vector to compare to existing dataset (1xN)
                dataSet: size m data set of known vectors (NxM)
                labels: data set labels (1xM vector)
                k: number of neighbors to use for comparison (should be an odd number)
                
    Output:     the most popular class label
    
    @author: pbharrin
    '''
    
    def classify0(inX, dataSet, labels, k):
        dataSetSize = dataSet.shape[0]      //输入的训练样本集dataSet的列数
        diffMat = tile(inX, (dataSetSize,1)) - dataSet //先对inX进行向量化处理,使之格式与dataSet一致,然后相减
        sqDiffMat = diffMat**2  //向量对应值差的平方
        sqDistances = sqDiffMat.sum(axis=1)//列的平方和的汇总
        distances = sqDistances**0.5 //开平方求距离
        sortedDistIndicies = distances.argsort()    
        classCount={}          
        for i in range(k):
            voteIlabel = labels[sortedDistIndicies[i]]
            classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1  //选择距离最小的k个点
        sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True) //排序
        return sortedClassCount[0][0]
  • 相关阅读:
    k8s-学习笔记12-权限体系
    Linux上磁盘热插拔
    delphi hashmap
    my gcc project
    gcc dll 导出问题 GTK+Glade3 Gtk-WARNING **: Could not find signal handler 问题最终解析
    c/c++字符串定义及使用的对比
    gcc printf()打印char* str
    gcc选项-g与-rdynamic的异同
    GCC编译,库的编译使用及Makefile
    gcc test
  • 原文地址:https://www.cnblogs.com/davidwang456/p/9729676.html
Copyright © 2011-2022 走看看