zoukankan      html  css  js  c++  java
  • 朴素贝叶斯-对数似然Python实现-Numpy

    《Machine Learning in Action》

    为防止连续乘法时每个乘数过小,而导致的下溢出(太多很小的数相乘结果为0,或者不能正确分类)

    训练:

    def trainNB0(trainMatrix,trainCategory):
        numTrainDocs = len(trainMatrix)
        numWords = len(trainMatrix[0])
        pAbusive = sum(trainCategory)/float(numTrainDocs)
        p0Num = ones(numWords);p1Num = ones(numWords)#计算频数初始化为1
        p0Denom = 2.0;p1Denom = 2.0                  #即拉普拉斯平滑
        for i in range(numTrainDocs):
            if trainCategory[i]==1:
                p1Num += trainMatrix[i]
                p1Denom += sum(trainMatrix[i])
            else:
                p0Num += trainMatrix[i]
                p0Denom += sum(trainMatrix[i])
        p1Vect = log(p1Num/p1Denom)#注意
        p0Vect = log(p0Num/p0Denom)#注意
        return p0Vect,p1Vect,pAbusive#返回各类对应特征的条件概率向量
                                     #和各类的先验概率
    

    分类:

    def classifyNB(vec2Classify,p0Vec,p1Vec,pClass1):
        p1 = sum(vec2Classify * p1Vec) + log(pClass1)#注意
        p0 = sum(vec2Classify * p0Vec) + log(1-pClass1)#注意
        if p1 > p0:
            return 1
        else:
            return 0
    
    def testingNB():#流程展示
        listOPosts,listClasses = loadDataSet()#加载数据
        myVocabList = createVocabList(listOPosts)#建立词汇表
        trainMat = []
        for postinDoc in listOPosts:
            trainMat.append(bagOfWord2VecMN(myVocabList,postinDoc))
        p0V,p1V,pAb = trainNB0(trainMat,listClasses)#训练
        #测试
        testEntry = ['love','my','dalmation']
        thisDoc = bagOfWord2VecMN(myVocabList,testEntry)
        print testEntry,'classified as: ',classifyNB(thisDoc,p0V,p1V,pAb)
    

    注意:上述代码中标有注意的地方,是公式中概率连乘变成了对数概率相加。此举可以在数学上证明不会影响分类结果,且在实际计算中,避免了因概率因子远小于1而连乘造成的下溢出。  

  • 相关阅读:
    假期学习01
    构建之法读后感(二)
    构建之法读后感(一)
    每日日报
    每日日报
    每日日报
    每日日报
    每日日报
    每周日报
    每日日报
  • 原文地址:https://www.cnblogs.com/eniac1946/p/7407205.html
Copyright © 2011-2022 走看看