关于手写识别问题,用KNN分类效果也不错,但是KNN需要保留所有的训练样本。而对于支持向量机只需保留边界的支持向量。
因为有些函数代码与上一篇博文(支持向量机-在复杂数据上应用核函数)的一致。建议只需将本节代码放入上篇代码之后即可。
1. 将图像转换成向量
def img2vector(filename): returnVect = zeros((1, 1024)) fr = open(filename) for i in range(32): lineStr = fr.readline() for j in range(32): returnVect[0, 32 * i + j] = int(lineStr[j]) return returnVect
2. 加载图像
def loadImages(dirName): from os import listdir hwLabels = [] print(dirName) trainingFileList = listdir(dirName) # load the training set m = len(trainingFileList) trainingMat = zeros((m, 1024)) for i in range(m): fileNameStr = trainingFileList[i] fileStr = fileNameStr.split('.')[0] # take off .txt classNumStr = int(fileStr.split('_')[0]) if classNumStr == 9: hwLabels.append(-1) else: hwLabels.append(1) trainingMat[i, :] = img2vector('%s/%s' % (dirName, fileNameStr)) return trainingMat, hwLabels
3. 测试数字分类
def testDigits(kTup=('rbf', 10)): # 1. 导入训练数据 dataArr, labelArr = loadImages('F:/迅雷下载/machinelearninginaction/Ch06/trainingDigits/') b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, kTup) datMat = mat(dataArr) labelMat = mat(labelArr).transpose() svInd = nonzero(alphas.A > 0)[0] sVs = datMat[svInd] labelSV = labelMat[svInd] print("there are %d Support Vectors" % shape(sVs)[0]) m, n = shape(datMat) errorCount = 0 for i in range(m): kernelEval = kernelTrans(sVs, datMat[i, :], kTup) # 1*m * m*1 = 1*1 单个预测结果 predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b if sign(predict) != sign(labelArr[i]): errorCount += 1 print("the training error rate is: %f" % (float(errorCount) / m)) # 2. 导入测试数据 dataArr, labelArr = loadImages('F:/迅雷下载/machinelearninginaction/Ch06/testDigits/') errorCount = 0 datMat = mat(dataArr) labelMat = mat(labelArr).transpose() m, n = shape(datMat) for i in range(m): kernelEval = kernelTrans(sVs, datMat[i, :], kTup) predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b if sign(predict) != sign(labelArr[i]): errorCount += 1 print("the test error rate is: %f" % (float(errorCount) / m))
支持向量机是一个二分类器,分类结果不是+1就是-1。我们这里1的标签为+1,9的标签为-1,其他的数字都被去掉了。
testDigits(('rbf', 20))
F:/迅雷下载/machinelearninginaction/Ch06/trainingDigits/ iteration number: 1 iteration number: 2 iteration number: 3 iteration number: 4 iteration number: 5 iteration number: 6 iteration number: 7 iteration number: 8 there are 67 Support Vectors the training error rate is: 0.000000 F:/迅雷下载/machinelearninginaction/Ch06/testDigits/ the test error rate is: 0.010753
testDigits(('rbf', 50))
F:/迅雷下载/machinelearninginaction/Ch06/trainingDigits/ iteration number: 1 iteration number: 2 iteration number: 3 iteration number: 4 iteration number: 5 iteration number: 6 there are 34 Support Vectors the training error rate is: 0.014925 F:/迅雷下载/machinelearninginaction/Ch06/testDigits/ the test error rate is: 0.010753
testDigits(('rbf', 10))
F:/迅雷下载/machinelearninginaction/Ch06/trainingDigits/ iteration number: 1 iteration number: 2 iteration number: 3 iteration number: 4 iteration number: 5 iteration number: 6 there are 118 Support Vectors the training error rate is: 0.000000 F:/迅雷下载/machinelearninginaction/Ch06/testDigits/ the test error rate is: 0.010753
可以尝试,当径向基核函数的参数σ取10左右,可以得到最小的测试错误