zoukankan      html  css  js  c++  java
  • 菜鸟之路——机器学习之KNN算法个人理解及Python实现

    KNN(K Nearest Neighbor)

    还是先记几个关键公式

    距离:一般用Euclidean distance   E(x,y)√∑(xi-yi)2 。名字这么高大上,就是初中学的两点间的距离嘛。

             还有其他距离的衡量公式,余弦值(cos),相关度(correlation) 曼哈顿距离(manhatann distance)。我觉得针对于KNN算法还是Euclidean distance最好,最直观。

    然后就选择最近的K个点。根据投票原则分类出结果。

    首先利用sklearn自带的的iris数据集和KNN算法运行一下

     1 from sklearn import neighbors     #knn算法在neighbor包里
     2 from sklearn import datasets      #包含常用的机器学习的包
     3 
     4 knn=neighbors.KNeighborsClassifier()   #新建knn算法类
     5 
     6 iris=datasets.load_iris()              #加载虹膜这种花的数据
     7 #print(iris) #这是个字典有data,target,target_name,这三个key,太多了,就打印出来了
     8 
     9 knn.fit(iris.data,iris.target)
    10 print(knn.fit(iris.data,iris.target)) #我也不知道为什么要这样fit一下形成一个模型。打印一下看看我觉得应该是为了记录一下数据的信息吧
    11 
    12 
    13 predictedLabel=knn.predict([[0.1,0.2,0.3,0.4]])#预测一下
    14 print(predictedLabel)
    15 print("predictedName:",iris.target_names[predictedLabel[0]])

    然后就自己写KNN算法啦

     1 import csv
     2 import random
     3 import math
     4 import operator
     5 
     6 #加载数据的
     7 def LoadDataset(filename,split):#split这个参数是用来分开训练集与测试集的,split属于[0,1]。即有多大的概率将所有数据选取为训练集
     8     trainingSet=[]
     9     testSet=[]
    10     with open(filename,'rt') as csvfile:
    11          lines=csv.reader(csvfile)
    12          dataset=list(lines)
    13          for x in range(len(dataset)-1):
    14              for y in range(4):
    15                  dataset[x][y]=float(dataset[x][y])
    16              if random.random()<split:      #random.random()生成一个[0,1]之间的随机数
    17                 trainingSet.append(dataset[x])
    18              else:
    19                  testSet.append(dataset[x])
    20     return [trainingSet,testSet]
    21 
    22 #此函数用来计算两点之间的距离
    23 def enclideanDinstance(instance1,instance2,length):#legdth为维度
    24     distance=0
    25     for x in range(length):
    26         distance+=pow((instance1[x]-instance2[x]),2)
    27     return math.sqrt(distance)
    28 
    29 #此函数选取K个离testInstance最近的trainingSet里的实例
    30 def getNeighbors(trainingSet,testInstance,k):
    31     distances=[]
    32     length=len(testInstance)-1
    33     for x in range(len(trainingSet)):
    34         dist=enclideanDinstance(testInstance,trainingSet[x],length)
    35         distances.append([trainingSet[x],dist])
    36     distances.sort(key=operator.itemgetter(1))#operator.itemgetter函数获取的不是值,而是定义了一个函数,取列表的第几个域的函数。
    37                                               # sort中的key也是用来指定取待排序元素的哪一项进行排序
    38                                               #这句的意思就是按照distances的第二个域进行排序
    39     neighbors=[]
    40     for x in range(k):
    41             neighbors.append(distances[x][0])
    42     return neighbors
    43 
    44 #这个函数就是从K的最邻近的实例中利用投票原则分类出结果
    45 def getResponce(neighbors):
    46     classVotes={}
    47     for x in range(len(neighbors)):
    48         responce=neighbors[x][-1]
    49         if responce in classVotes:
    50             classVotes[responce]+=1
    51         else:
    52             classVotes[responce] = 1
    53     sortedVotes=sorted(classVotes.items(),key=operator.itemgetter(1),reverse=True)
    54     return sortedVotes[0][0]
    55 
    56 #这个函数从测试结果与真实结果中得出正确率
    57 def getAccuracy(testSet,predictions):
    58     corrrect=0
    59     for x in range(len(testSet)):
    60         if testSet[x][-1] ==predictions[x]:
    61             corrrect+=1
    62     return (corrrect/float(len(testSet)))*100
    63 
    64 def main():
    65     split=0.67   #将选取67%的数据作为训练集
    66     [trainingSet,testSet]=LoadDataset('irisdata.txt',split)
    67     print("trainingSet:",len(trainingSet),trainingSet)
    68     print("testSet",len(testSet),testSet)
    69 
    70     predictions=[]
    71     k=3  #选取三个最邻近的实例
    72     #测试所有测试集
    73     for x in range(len(testSet)):
    74         neighbors=getNeighbors(trainingSet,testSet[x],k)
    75         result=getResponce(neighbors)
    76         predictions.append(result)
    77         print(">predicted",result,",actual=",testSet[x][-1])
    78         
    79     accuracy=getAccuracy(testSet,predictions)
    80     print("Accuracy:",accuracy,r"%")
    81     
    82 if __name__ == '__main__':
    83     main()


    里面有我对代码的理解

     运行结果为

    trainingSet: 110 [[4.9, 3.0, 1.4, 0.2, 'Iris-setosa'], [4.7, 3.2, 1.3, 0.2, 'Iris-setosa'], [5.0, 3.6, 1.4, 0.2, 'Iris-setosa'], [5.4, 3.9, 1.7, 0.4, 'Iris-setosa'], [4.6, 3.4, 1.4, 0.3, 'Iris-setosa'], [4.4, 2.9, 1.4, 0.2, 'Iris-setosa'], [4.9, 3.1, 1.5, 0.1, 'Iris-setosa'], [5.4, 3.7, 1.5, 0.2, 'Iris-setosa'], [4.8, 3.4, 1.6, 0.2, 'Iris-setosa'], [4.3, 3.0, 1.1, 0.1, 'Iris-setosa'], [5.8, 4.0, 1.2, 0.2, 'Iris-setosa'], [5.7, 4.4, 1.5, 0.4, 'Iris-setosa'], [5.4, 3.9, 1.3, 0.4, 'Iris-setosa'], [5.7, 3.8, 1.7, 0.3, 'Iris-setosa'], [5.4, 3.4, 1.7, 0.2, 'Iris-setosa'], [4.6, 3.6, 1.0, 0.2, 'Iris-setosa'], [4.8, 3.4, 1.9, 0.2, 'Iris-setosa'], [5.0, 3.0, 1.6, 0.2, 'Iris-setosa'], [5.0, 3.4, 1.6, 0.4, 'Iris-setosa'], [5.2, 3.5, 1.5, 0.2, 'Iris-setosa'], [4.7, 3.2, 1.6, 0.2, 'Iris-setosa'], [4.8, 3.1, 1.6, 0.2, 'Iris-setosa'], [5.4, 3.4, 1.5, 0.4, 'Iris-setosa'], [5.2, 4.1, 1.5, 0.1, 'Iris-setosa'], [4.9, 3.1, 1.5, 0.1, 'Iris-setosa'], [5.0, 3.2, 1.2, 0.2, 'Iris-setosa'], [5.5, 3.5, 1.3, 0.2, 'Iris-setosa'], [4.4, 3.0, 1.3, 0.2, 'Iris-setosa'], [5.0, 3.5, 1.3, 0.3, 'Iris-setosa'], [4.5, 2.3, 1.3, 0.3, 'Iris-setosa'], [4.4, 3.2, 1.3, 0.2, 'Iris-setosa'], [5.1, 3.8, 1.9, 0.4, 'Iris-setosa'], [4.8, 3.0, 1.4, 0.3, 'Iris-setosa'], [5.1, 3.8, 1.6, 0.2, 'Iris-setosa'], [4.6, 3.2, 1.4, 0.2, 'Iris-setosa'], [5.3, 3.7, 1.5, 0.2, 'Iris-setosa'], [7.0, 3.2, 4.7, 1.4, 'Iris-versicolor'], [6.4, 3.2, 4.5, 1.5, 'Iris-versicolor'], [5.5, 2.3, 4.0, 1.3, 'Iris-versicolor'], [6.5, 2.8, 4.6, 1.5, 'Iris-versicolor'], [5.7, 2.8, 4.5, 1.3, 'Iris-versicolor'], [4.9, 2.4, 3.3, 1.0, 'Iris-versicolor'], [6.6, 2.9, 4.6, 1.3, 'Iris-versicolor'], [5.0, 2.0, 3.5, 1.0, 'Iris-versicolor'], [5.9, 3.0, 4.2, 1.5, 'Iris-versicolor'], [6.0, 2.2, 4.0, 1.0, 'Iris-versicolor'], [5.6, 2.9, 3.6, 1.3, 'Iris-versicolor'], [6.7, 3.1, 4.4, 1.4, 'Iris-versicolor'], [5.6, 3.0, 4.5, 1.5, 'Iris-versicolor'], [5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'], [5.6, 2.5, 3.9, 1.1, 'Iris-versicolor'], [5.9, 3.2, 4.8, 1.8, 'Iris-versicolor'], [6.3, 2.5, 4.9, 1.5, 'Iris-versicolor'], [6.4, 2.9, 4.3, 1.3, 'Iris-versicolor'], [6.8, 2.8, 4.8, 1.4, 'Iris-versicolor'], [6.7, 3.0, 5.0, 1.7, 'Iris-versicolor'], [6.0, 2.9, 4.5, 1.5, 'Iris-versicolor'], [5.7, 2.6, 3.5, 1.0, 'Iris-versicolor'], [5.5, 2.4, 3.8, 1.1, 'Iris-versicolor'], [5.8, 2.7, 3.9, 1.2, 'Iris-versicolor'], [6.0, 2.7, 5.1, 1.6, 'Iris-versicolor'], [5.4, 3.0, 4.5, 1.5, 'Iris-versicolor'], [6.0, 3.4, 4.5, 1.6, 'Iris-versicolor'], [6.3, 2.3, 4.4, 1.3, 'Iris-versicolor'], [5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'], [5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'], [6.1, 3.0, 4.6, 1.4, 'Iris-versicolor'], [5.8, 2.6, 4.0, 1.2, 'Iris-versicolor'], [5.0, 2.3, 3.3, 1.0, 'Iris-versicolor'], [5.6, 2.7, 4.2, 1.3, 'Iris-versicolor'], [5.7, 3.0, 4.2, 1.2, 'Iris-versicolor'], [5.7, 2.9, 4.2, 1.3, 'Iris-versicolor'], [6.2, 2.9, 4.3, 1.3, 'Iris-versicolor'], [5.1, 2.5, 3.0, 1.1, 'Iris-versicolor'], [5.7, 2.8, 4.1, 1.3, 'Iris-versicolor'], [6.3, 3.3, 6.0, 2.5, 'Iris-virginica'], [5.8, 2.7, 5.1, 1.9, 'Iris-virginica'], [7.1, 3.0, 5.9, 2.1, 'Iris-virginica'], [6.5, 3.0, 5.8, 2.2, 'Iris-virginica'], [7.6, 3.0, 6.6, 2.1, 'Iris-virginica'], [4.9, 2.5, 4.5, 1.7, 'Iris-virginica'], [6.5, 3.2, 5.1, 2.0, 'Iris-virginica'], [6.4, 2.7, 5.3, 1.9, 'Iris-virginica'], [5.8, 2.8, 5.1, 2.4, 'Iris-virginica'], [6.4, 3.2, 5.3, 2.3, 'Iris-virginica'], [6.5, 3.0, 5.5, 1.8, 'Iris-virginica'], [7.7, 2.6, 6.9, 2.3, 'Iris-virginica'], [6.0, 2.2, 5.0, 1.5, 'Iris-virginica'], [6.9, 3.2, 5.7, 2.3, 'Iris-virginica'], [7.7, 2.8, 6.7, 2.0, 'Iris-virginica'], [6.3, 2.7, 4.9, 1.8, 'Iris-virginica'], [7.2, 3.2, 6.0, 1.8, 'Iris-virginica'], [6.2, 2.8, 4.8, 1.8, 'Iris-virginica'], [6.1, 3.0, 4.9, 1.8, 'Iris-virginica'], [6.4, 2.8, 5.6, 2.1, 'Iris-virginica'], [7.4, 2.8, 6.1, 1.9, 'Iris-virginica'], [6.4, 2.8, 5.6, 2.2, 'Iris-virginica'], [6.1, 2.6, 5.6, 1.4, 'Iris-virginica'], [7.7, 3.0, 6.1, 2.3, 'Iris-virginica'], [6.3, 3.4, 5.6, 2.4, 'Iris-virginica'], [6.4, 3.1, 5.5, 1.8, 'Iris-virginica'], [6.9, 3.1, 5.4, 2.1, 'Iris-virginica'], [6.7, 3.1, 5.6, 2.4, 'Iris-virginica'], [6.9, 3.1, 5.1, 2.3, 'Iris-virginica'], [5.8, 2.7, 5.1, 1.9, 'Iris-virginica'], [6.8, 3.2, 5.9, 2.3, 'Iris-virginica'], [6.7, 3.0, 5.2, 2.3, 'Iris-virginica'], [6.3, 2.5, 5.0, 1.9, 'Iris-virginica'], [6.5, 3.0, 5.2, 2.0, 'Iris-virginica'], [6.2, 3.4, 5.4, 2.3, 'Iris-virginica']]
    testSet 40 [[5.1, 3.5, 1.4, 0.2, 'Iris-setosa'], [4.6, 3.1, 1.5, 0.2, 'Iris-setosa'], [5.0, 3.4, 1.5, 0.2, 'Iris-setosa'], [4.8, 3.0, 1.4, 0.1, 'Iris-setosa'], [5.1, 3.5, 1.4, 0.3, 'Iris-setosa'], [5.1, 3.8, 1.5, 0.3, 'Iris-setosa'], [5.1, 3.7, 1.5, 0.4, 'Iris-setosa'], [5.1, 3.3, 1.7, 0.5, 'Iris-setosa'], [5.2, 3.4, 1.4, 0.2, 'Iris-setosa'], [5.5, 4.2, 1.4, 0.2, 'Iris-setosa'], [4.9, 3.1, 1.5, 0.1, 'Iris-setosa'], [5.1, 3.4, 1.5, 0.2, 'Iris-setosa'], [5.0, 3.5, 1.6, 0.6, 'Iris-setosa'], [5.0, 3.3, 1.4, 0.2, 'Iris-setosa'], [6.9, 3.1, 4.9, 1.5, 'Iris-versicolor'], [6.3, 3.3, 4.7, 1.6, 'Iris-versicolor'], [5.2, 2.7, 3.9, 1.4, 'Iris-versicolor'], [6.1, 2.9, 4.7, 1.4, 'Iris-versicolor'], [6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'], [6.1, 2.8, 4.0, 1.3, 'Iris-versicolor'], [6.1, 2.8, 4.7, 1.2, 'Iris-versicolor'], [6.6, 3.0, 4.4, 1.4, 'Iris-versicolor'], [5.5, 2.4, 3.7, 1.0, 'Iris-versicolor'], [6.7, 3.1, 4.7, 1.5, 'Iris-versicolor'], [5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'], [6.3, 2.9, 5.6, 1.8, 'Iris-virginica'], [7.3, 2.9, 6.3, 1.8, 'Iris-virginica'], [6.7, 2.5, 5.8, 1.8, 'Iris-virginica'], [7.2, 3.6, 6.1, 2.5, 'Iris-virginica'], [6.8, 3.0, 5.5, 2.1, 'Iris-virginica'], [5.7, 2.5, 5.0, 2.0, 'Iris-virginica'], [7.7, 3.8, 6.7, 2.2, 'Iris-virginica'], [5.6, 2.8, 4.9, 2.0, 'Iris-virginica'], [6.7, 3.3, 5.7, 2.1, 'Iris-virginica'], [7.2, 3.0, 5.8, 1.6, 'Iris-virginica'], [7.9, 3.8, 6.4, 2.0, 'Iris-virginica'], [6.3, 2.8, 5.1, 1.5, 'Iris-virginica'], [6.0, 3.0, 4.8, 1.8, 'Iris-virginica'], [6.7, 3.3, 5.7, 2.5, 'Iris-virginica'], [5.9, 3.0, 5.1, 1.8, 'Iris-virginica']]
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-setosa ,actual= Iris-setosa
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-versicolor ,actual= Iris-versicolor
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-versicolor ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    >predicted Iris-virginica ,actual= Iris-virginica
    Accuracy: 97.5 %

    以下拓展几个知识点

    1,random库的一些用法

    random.randint(1,10)        # 产生 1 到 10 的一个整数型随机数  
    random.random()             # 产生 0 到 1 之间的随机浮点数
    random.uniform(1.1,5.4)     # 产生  1.1 到 5.4 之间的随机浮点数,区间可以不是整数
    random.choice('tomorrow')   # 从序列中随机选取一个元素
    random.randrange(1,100,2)   # 生成从1到100的间隔为2的随机整数
    random.shuffle(a)           # 将序列a中的元素顺序打乱 

    2,排序函数

    sorted(exapmle[, cmp[, key[, reverse]]])

    example.sort(cmp[, key[, reverse]])

         example是和待排序序列

         cmp为函数,指定排序时进行比较的函数,可以指定一个函数或者lambda函数

         key为函数,指定取待排序元素的哪一项进行排序

         reverse实现降序排序,需要提供一个布尔值,默认为False(升序排列)。

    程序中的第53行   sortedVotes=sorted(classVotes.items(),key=operator.itemgetter(1),reverse=True)就是按照sortedVotes的第二个域进行降序排列

    key=operator.itemgetter(n)就是按照第n+1个域

    写完喽,图书馆也该闭馆了。学习的感觉真舒服。接下来就是最出名的SVM算法啦

  • 相关阅读:
    Ubuntu 14.04 卸载通过源码安装的库
    Ubuntu 14.04 indigo 相关依赖
    Ubuntu 14.04 indigo 安装 cartographer 1.0.0
    Ubuntu 14.04 改变文件或者文件夹的拥有者
    安装cartographer遇到Unrecognized syntax identifier "proto3". This parser only recognizes "proto2"问题
    Unrecognized syntax identifier "proto3". This parser only recognizes "proto2". ”问题解决方法
    查看所有用户组,用户名
    1卸载ROS
    Ubuntu14.04 软件安装卸载
    Ubuntu14.04系统显示器不自动休眠修改
  • 原文地址:https://www.cnblogs.com/albert-yzp/p/9519066.html
Copyright © 2011-2022 走看看