zoukankan html css js c++ java

Logistic Regression 用于预测马是否生病

1.利用Logistic regression 进行分类的主要思想

根据现有数据对分类边界线建立回归公式，即寻找最佳拟合参数集，然后进行分类。

2.利用梯度下降找出最佳拟合参数

3.代码实现

  1 # -*- coding: utf-8 -*-
  2 """
  3 Created on Tue Mar 28 21:35:25 2017
  4 
  5 @author: MyHome
  6 """
  7 import numpy as np
  8 from random import uniform
  9 '''定义sigmoid函数'''
 10 def sigmoid(inX):
 11     return 1.0 /(1.0 +np.exp(-inX))
 12 
 13 '''使用随机梯度下降更新权重，并返回最终值'''
 14 def StocGradientDescent(dataMatrix,classLabels,numIter = 600):
 15     m,n = dataMatrix.shape
 16     #print m,n
 17     weights = np.ones(n)
 18     for j in xrange(numIter):
 19         dataIndex = range(m)
 20 
 21         for i in xrange(m):
 22 
 23             alpha = 4 / (1.0+j+i) + 0.01
 24             randIndex = int(uniform(0,len(dataIndex)))
 25             h = sigmoid(sum(dataMatrix[randIndex]*weights))
 26             gradient = (h - classLabels[randIndex])*dataMatrix[randIndex]
 27             weights = weights - alpha*gradient
 28             del(dataIndex[randIndex])
 29 
 30     return weights
 31 
 32 
 33 '''创建分类器'''
 34 def classifyVector(inX,weights):
 35     prob = sigmoid(sum(inX*weights))
 36     if prob > 0.5:
 37         return 1.0
 38     else:
 39         return 0.0
 40 
 41 '''测试'''
 42 def Test():
 43 
 44     frTrain = open("horseColicTraining.txt")
 45     frTest = open("horseColicTest.txt")
 46     trainingSet = []
 47     trainingLabel = []
 48     for line in frTrain.readlines():
 49         currLine = line.strip().split("	")
 50         lineArr = []
 51         for i in range(21):
 52             lineArr.append(float(currLine[i]))
 53         trainingSet.append(lineArr)
 54         trainingLabel.append(float(currLine[21]))
 55     trainWeights = StocGradientDescent(np.array(trainingSet),trainingLabel)
 56     errorCount = 0.0
 57     numTestVec = 0.0
 58     for line in frTest.readlines():
 59         numTestVec += 1.0
 60         currLine = line.strip().split("	")
 61         lineArr = []
 62         for i in range(21):
 63             lineArr.append(float(currLine[i]))
 64         if int(classifyVector(np.array(lineArr),trainWeights)) != int(currLine[21]):
 65             errorCount += 1
 66     errorRate = (float(errorCount)/numTestVec)
 67     print "the error rate of this test is:%f"%errorRate
 68     return errorRate
 69 
 70 '''调用Test()10次求平均值'''
 71 def multiTest():
 72     numTest = 10
 73     errorSum = 0.0
 74     for k in range(numTest):
 75         errorSum += Test()
 76     print "after %d iterations the average errror rate is:
 77         %f"%(numTest,errorSum/float(numTest))
 78 
 79 if __name__ == "__main__":
 80     multiTest()

结果：

the error rate of this test is:0.522388
the error rate of this test is:0.328358
the error rate of this test is:0.313433
the error rate of this test is:0.358209
the error rate of this test is:0.298507
the error rate of this test is:0.343284
the error rate of this test is:0.283582
the error rate of this test is:0.313433
the error rate of this test is:0.343284
the error rate of this test is:0.358209
after 10 iterations the average errror rate is: 0.346269

4.总结

Logistic regression is finding best-fit parameters to a nonlinear function called the sigmoid.
Methods of optimization can be used to find the best-fit parameters. Among the
optimization algorithms, one of the most common algorithms is gradient descent. Gradient
desent can be simplified with stochastic gradient descent.
Stochastic gradient descent can do as well as gradient descent using far fewer computing
resources. In addition, stochastic gradient descent is an online algorithm; it can
update what it has learned as new data comes in rather than reloading all of the data
as in batch processing.
One major problem in machine learning is how to deal with missing values in the
data. There’s no blanket answer to this question. It really depends on what you’re
doing with the data. There are a number of solutions, and each solution has its own
advantages and disadvantages.

查看全文

相关阅读:
面试中你能做到随机应变吗？沧海
 QQ只是一场意外沧海
 面试中要慎言沧海
 你会应对这些面试题吗？沧海
 面试小技巧沧海
 面试抓住最初三分钟至关重要沧海
 面试的十二种高级错误沧海
 几种有难度的面试沧海
 面试技巧: 轻松过关10种方法沧海
 面试细节一点通沧海

原文地址：https://www.cnblogs.com/lpworkstudyspace1992/p/6639120.html