【机器学习算法-python实现】Adaboost的实现(1)-单层决策树(decision stump)

zoukankan html css js c++ java

【机器学习算法-python实现】Adaboost的实现(1)-单层决策树(decision stump)
(转载请注明出处：http://blog.csdn.net/buptgshengod)
1.背景
上一节学习支持向量机，感觉公式都太难理解了，弄得我有点头大。只是这一章的Adaboost线比較起来就容易得多。
Adaboost是用元算法的思想进行分类的。
什么事元算法的思想呢？就是依据数据集的不同的特征在决定结果时所占的比重来划分数据集。就是要对每一个特征值都构建决策树，而且赋予他们不同的权值，最后集合起来比較。

比方说我们能够通过是否有胡子和身高的高度这两个特征来来决定一个人的性别，非常明显是否有胡子可能在判定性别方面比身高更准确，所以在判定的时候我们就赋予这个特征更大的权重，比方说我们把权重设成0.8：0.2。
这样就比0.5：0.5的权重来的更准确些。

2.构建决策树
接着我们来构建决策树。我们的决策树要实现主要两个功能，一个是找出对结果影响最大的特征值。另外一个功能是找到这个特征值得阈值。阈值就是，比方说阈值是d，当特征值大于d结果为1，当特征值小于d结果为0。

首先看下数据集。是一个两个特征值的矩阵。
ef loadSimpData(): datMat = matrix([[ 1. , 2.1], [ 2. , 1.1], [ 1.3, 1. ], [ 1. , 1. ], [ 2. , 1. ]]) classLabels = [1.0, 1.0, -1.0, -1.0, 1.0] return datMat,classLabels

接着是树的分类函数。这个函数在以下的循环里要用到，作用非常easy，就是比对每一列的特征值和目标函数，返回比对的结果。四个參数各自是（输入矩阵，第几列，阈值，lt或gt）
def stumpClassify(dataMatrix,dimen,threshVal,threshIneq):#just classify the data retArray = ones((shape(dataMatrix)[0],1)) if threshIneq == 'lt': retArray[dataMatrix[:,dimen] <= threshVal] = -1.0 else: retArray[dataMatrix[:,dimen] > threshVal] = -1.0 return retArray

最后是构建二叉树函数，通过循环比較得到最佳特征值和它的阈值。D是初始矩阵的权重。
def buildStump(dataArr,classLabels,D): dataMatrix = mat(dataArr); labelMat = mat(classLabels).T m,n = shape(dataMatrix) numSteps = 10.0; bestStump = {}; bestClasEst = mat(zeros((m,1))) minError = inf #init error sum, to +infinity for i in range(n):#loop over all dimensions rangeMin = dataMatrix[:,i].min(); rangeMax = dataMatrix[:,i].max(); stepSize = (rangeMax-rangeMin)/numSteps for j in range(-1,int(numSteps)+1):#loop over all range in current dimension for inequal in ['lt', 'gt']: #go over less than and greater than threshVal = (rangeMin + float(j) * stepSize) predictedVals = stumpClassify(dataMatrix,i,threshVal,inequal)#call stump classify with i, j, lessThan errArr = mat(ones((m,1))) errArr[predictedVals == labelMat] = 0 weightedError = D.T*errArr #calc total error multiplied by D #print "split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % (i, threshVal, inequal, weightedError) if weightedError < minError: minError = weightedError bestClasEst = predictedVals.copy() bestStump['dim'] = i bestStump['thresh'] = threshVal bestStump['ineq'] = inequal return bestStump,minError,bestClasEst

3.结果

当我们如果初始权重同样（5行数据也就是都是0.2），得到结果

{'dim': 0, 'ineq': 'lt', 'thresh': 1.3}——第一个特征值权重最大。阈值是1.3

[[ 0.2]]——错误率0.2，也就是五个错一个

[[-1.]————推断结果。第一个数据错误
[ 1.]
[-1.]
[-1.]
[ 1.]]

4.代码下载

下载地址（Decision Stump）

參考文献：
[1] machine learning in action，Peter Harrington
查看全文

相关阅读:
解释DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci
MySQL性能优化
 MySQL中的binlog相关命令和恢复技巧
 保障MySQL安全的14个最佳方法
 MySQL忘记root密码的解决方案
 MySQL利用binlog来恢复数据库
 MySQL命令mysqldump参数大全
 MySQL REPLACE替换输出
 MySQL -A不预读数据库信息(use dbname 更快)
MySQL 慢查询配置

原文地址：https://www.cnblogs.com/mengfanrong/p/5240947.html

【机器学习算法-python实现】Adaboost的实现(1)-单层决策树(decision stump)

1.背景

2.构建决策树

3.结果

4.代码下载