《机器学习实战》菜鸟学习笔记（一）

zoukankan html css js c++ java

《机器学习实战》菜鸟学习笔记（一）
《机器学习实战》终于到手了，开始学习了。由于本人python学的比较挫，所以学习笔记里会有许多python的内容。

1、 python及其各种插件的安装

由于我使用了win8.1 64位系统（正版的哦），所以像numpy 和 matploblib这种常用的插件不太好装，解决方案就是Anaconda-2.0.1-Windows-x86_64.exe 一次性搞定。

2、kNN代码
1 #-*-coding:utf-8-*- 2 from numpy import * 3 import operator 4 5 def createDataSet(): 6 group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]]) 7 labels = ['A','A','B','B'] 8 return group,labels 9 10 def classify0(inX,dataSet,labels,k): 11 dataSetSize = dataSet.shape[0] 12 diffMat = tile(inX,(dataSetSize,1))-dataSet 13 sqDiffMat = diffMat ** 2 14 sqDistances = sqDiffMat.sum(axis = 1) 15 distances = sqDistances ** 0.5 16 sortedDistIndicies = distances.argsort() #indices 17 classCount = {} 18 for i in range(k): 19 voteIlabel = labels[sortedDistIndicies[i]] 20 classCount[voteIlabel] = classCount.get(voteIlabel,0)+1 21 #找出最大的那个 22 sortedClassCount = sorted(classCount.iteritems(), 23 key = operator.itemgetter(1),reverse = True) 24 return sortedClassCount[0][0]
这里的疑惑主要出现在：

（1）array与list有什么区别

array 是numpy里面定义的。为了方便计算，比如
1 array([1,2])+array([3,4]) 2 [1,2]+[3,4]
执行以下就可以知道他们的差别了

（2）shape[0]返回的是哪一维度的大小（不要嘲笑我小白，我真的不知道）

找到文档看了一下就开朗了。ndarray.shape “the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.”

（3）tile函数

tile函数是经常使用的函数，用于扩充array。举例：
1 >>> b = np.array([[1, 2], [3, 4]]) 2 >>> np.tile(b, 2) 3 array([[1, 2, 1, 2], 4 [3, 4, 3, 4]]) 5 >>> np.tile(b, (2, 1)) 6 array([[1, 2], 7 [3, 4], 8 [1, 2], 9 [3, 4]])
这下就懂了吧。为什么要用这个函数呢？因为后面两个array要做差，这样做就可以不用使用循环了，典型的空间换时间。那么为什么要做差呢？好吧，因为这是knn算法。

（4）array的sum函数

写到这里，我决定要好好读读numpy文档了。
numpy.sum(a, axis=None, dtype=None, out=None, keepdims=False)
一个sum函数还是挺麻烦的呢
>>> np.sum([[0, 1], [0, 5]], axis=0) array([0, 6]) >>> np.sum([[0, 1], [0, 5]], axis=1) array([1, 5])
这样大家都清楚了

（5）最后一行，return了什么？

表面看起来像是二维数组的第一个元素，但是sortedClassCount是二维数组吗？

写了一个小的验证程序，发现sortedClassCount是一个list，元素是tuple。
L = {1:12,3:4} sortedL = sorted(L.iteritems(),key=operator.itemgetter(1)) print sortedL #结果 [(3, 4), (1, 12)]
查看全文

相关阅读:
【转】常用 Java 静态代码分析工具的分析与比较
 转-SQL Server系列-我感觉自己不用写了，很清晰很有条理
 转-SQL Server Alwayson概念总结
 超级快的python vibora.io框架
 [转]做超炫图表，数据可视化的优雅实现方案 (硬核科普)
统计资料下载论坛
 [FW]Windows7 Python-3.6 安装PyCrypto(pycrypto 2.6.1)出现错误以及解决方法
 国内外常用公共NTP网络时间服务器
 HTTP Error 500.24
step7 microwin v4.0 sp9 win7 64位首次安装s7-200的软件，首次使用西门子软件，每次提示这个信息，怎么解决，网上有说，倒入注册表信息，按照下图进行操作还是不行，求各位高手指点一下，谢谢！

原文地址：https://www.cnblogs.com/shyustc/p/4000064.html