zoukankan      html  css  js  c++  java
  • [置顶] 决策树之数据划分

    这篇文章利用了信息熵计算的东西,先写一个数据划分的东西,先写一个简单的逻辑划分:

    def splitDataSet(dataSet, axis, value):
        retDataSet = []
        for featVec in dataSet:
            if featVec[axis] == value:
                reducedFeatVec = featVec[:axis]
                reducedFeatVec.extend(featVec[axis+1:])
                retDataSet.append(reducedFeatVec)

    这是最简单的一个逻辑划分,首先建立一个新的数据集,将划分后的数据添入该数据集,

    下面介绍选择最最好的数据集划分方式:

    1.建立该列的变化标签

    2.计算每种划分的信息熵

    对上面步骤进行循环。

    下面贴出该步骤的代码:

    def chooseBestFeatureToSplit(dataSet):
        numberFeatures = len(dataSet[0])-1
        baseEntropy = calcShannonEnt(dataSet)
        bestInfoGain = 0.0;
        bestFeature = -1;
        for i in range(numberFeatures):
            featList = [example[i] for example in dataSet]
            print(featList)
            uniqueVals = set(featList)
            print(uniqueVals)
            newEntropy =0.0
            for value in uniqueVals:
                subDataSet = splitDataSet(dataSet, i, value)
                prob = len(subDataSet)/float(len(dataSet))
                newEntropy += prob * calcShannonEnt(subDataSet)
            infoGain = baseEntropy - newEntropy
            if(infoGain > bestInfoGain):
                bestInfoGain = infoGain
                bestFeature = i
        return bestFeature

    下面给个运行结果截图:

  • 相关阅读:
    LOL 计蒜客
    cf1486 D. Max Median
    P3567 [POI2014]KUR-Couriers
    dp 求物品组合情况
    黑暗爆炸
    hdu5306 Gorgeous Sequence
    P4609 [FJOI2016]建筑师
    cf 1342 E. Placing Rooks
    重修dp-背包
    acwing 2154. 梦幻布丁
  • 原文地址:https://www.cnblogs.com/javawebsoa/p/3165902.html
Copyright © 2011-2022 走看看