zoukankan      html  css  js  c++  java
  • 3.9《机器学习实战》勘误

    ## 4.5 使用Python进行文本分类 代码错误及修正

      原代码4-2中条件概率分母有误, 如P(cute=1|ci=0)应为1/3.

    def trainNB0(trainMatrix, trainCategory):
        numTrainDocs = len(trainMatrix)
        numWords = len(trainMatrix[0])
        pAbusive = sum(trainCategory)/float(numTrainDocs)
        p0Num = ones(numWords)
        p1Num = ones(numWords)
        p0Denom = 2.0
        p1Denom = 2.0
        for i in range(numTrainDocs):
            if trainCategory[i] == 1:
                p1Num += trainMatrix[i]
                p1Denom += 1 #条件概率分母修正
            else:
                p0Num += trainMatrix[i]
                p0Denom += 1 #条件概率分母修正
        p1Vect = p1Num/p1Denom #求log放在后面了
        p0Vect = p0Num/p0Denom #求log放在后面了
        return p0Vect, p1Vect, pAbusive

      原代码4-3中计算p1和p0时只考虑了所有P(wi=1|ci)分量,而忽略了P(wi=0|ci)分量, 而P(wi=0|ci) = 1-P(wi=1|ci).

    def classifyNB(vec2Classify, p0Vect, p1Vect, pClass1):
        oneVect = ones(len(p0Vect))  #制造一个等维度的全1向量
        p1VectInv = oneVect - p1Vect #制造P(wi=0|ci=1)向量
        p0VectInv = oneVect - p0Vect #制造P(wi=0|ci=0)向量
        p1Vect = log(p1Vect); p0Vect = log(p0Vect)
    p1VectInv = log(p1VectInv); p0VectInv = log(p0VectInv) #全部取对数 vec2ClassifyInv = oneVect - vec2Classify #制造用于取出各个P(wi=0|ci)的向量 p1 = sum(vec2Classify*p1Vect) + sum(vec2ClassifyInv*p1VectInv) + log(pClass1) p0 = sum(vec2Classify*p0Vect) + sum(vec2ClassifyInv*p0VectInv) + log(1.0-pClass1) if p1 > p0: return 1 else: return 0

    ## 5-5回归系数丢掉了w0项,应在训练集和测试集分别添加X0=1.0列.

    注:本书配合CMU课程食用效果更佳:http://www.cs.cmu.edu/~ninamf/courses/601sp15/index.html

  • 相关阅读:
    vue项目中关闭eslint
    关于ios的safari下,页面底部弹出登陆遮罩层,呼出软键盘时 问题解决
    1.wap端绑定电话号码&发送短信
    node.js的安装与配置
    2020年Web前端开发工作容易找吗?
    JS干货分享—-this指向
    2020年学习前端开发应该看哪些书?
    laravel migrate增加、修改、删除字段
    echarts饼状统计图、柱状统计图
    PHP伪静态
  • 原文地址:https://www.cnblogs.com/dhfly/p/14713592.html
Copyright © 2011-2022 走看看