zoukankan      html  css  js  c++  java
  • [置顶] 贝叶斯分类(三)

    前两篇已经将算法思想实现,这次对其进项下更新修正一些小的细节,我们知道计算概率乘积时候如果某个概率为0,那么概率相乘结果为0,这显然不是我们想要的结果,还有就是如果出现很多非常小的数相乘会向下溢出。

    实现起来就修改4行代码:

        p0Num = ones(numWords)
        p1Num = ones(numWords)      #change to ones() 
        #print(p0Num,p1Num )
        p0Denom = 2.0
        p1Denom = 2.0                        #change to 0.0


    同样后面的值也要取对数:

     

        p1Vect = log(p1Num/p1Denom)         #change to log()
        p0Vect = log(p0Num/p0Denom)     #change to log()


    对了还有个词袋模型:

    def bagOFWords2VecMN(vocabList,inputSet):
        returnVec = [0]*len(vocabList)
        for word in inputSet:
            if word in vocabList:
                returnVec[vocabList.index(word)] +=1
        return returnVec



    这样我们的朴素贝叶斯的理论和实际代码就都完成了。一定觉得非常简单,下面给出我的全部代码,方便大家使用:

    from numpy import *
    import time
    
    def loadDataSet():
        postingList=[['my', 'dog', 'has', 'flea', 'problems', 'help', 'please'],
                     ['maybe', 'not', 'take', 'him', 'to', 'dog', 'park', 'stupid'],
                     ['my', 'dalmation', 'is', 'so', 'cute', 'I', 'love', 'him'],
                     ['stop', 'posting', 'stupid', 'worthless', 'garbage'],
                     ['mr', 'licks', 'ate', 'my', 'steak', 'how', 'to', 'stop', 'him'],
                     ['quit', 'buying', 'worthless', 'dog', 'food', 'stupid']]
        classVec = [0,1,0,1,0,1]    #1 is abusive, 0 not
        return postingList,classVec
                     
    def createVocabList(dataSet):
        vocabSet = set([])  #create empty set
        for document in dataSet:
            vocabSet = vocabSet | set(document) #union of the two sets
        return list(vocabSet)
    
    def setOfWords2Vec(vocabList, inputSet):
        returnVec = [0]*len(vocabList)
        for word in inputSet:
            if word in vocabList:
                returnVec[vocabList.index(word)] = 1
            else: 
                print ("the word: %s is not in my Vocabulary!" % word)
        return returnVec
        
    def trainNB0(trainMatrix,trainCategory):
        numTrainDocs = len(trainMatrix)
        numWords = len(trainMatrix[0])
        pAbusive = sum(trainCategory)/float(numTrainDocs)
        p0Num = zeros(numWords)
        p1Num = zeros(numWords)      #change to ones() 
        #print(p0Num,p1Num )
        p0Denom = 0.0
        p1Denom = 0.0                        #change to 0.0
        for i in range(numTrainDocs):
            if trainCategory[i] == 1:
                p1Num += trainMatrix[i]
                p1Denom += sum(trainMatrix[i])
            else:
                p0Num += trainMatrix[i]
                p0Denom += sum(trainMatrix[i])
        p1Vect = p1Num/p1Denom         #change to log()
        p0Vect = p0Num/p0Denom     #change to log()
        return p0Vect,p1Vect,pAbusive
    
    def classifyNB(vec2Classify, p0Vec, p1Vec, pClass1):
        p1 = sum(vec2Classify*p1Vec)+log(pClass1)
        p0 = sum(vec2Classify*p0Vec)+log(1.0-pClass1)
        if p1>p0:
            return 1
        else:
            return 0
    
    
    
    listOposts,listClasses = loadDataSet()
    
    
    myVocabList = createVocabList(listOposts)
    print(myVocabList)
    
    tmp= setOfWords2Vec(myVocabList, listOposts[0])
    print(tmp)
    
    trainMat =[]
    for postinDoc in listOposts:
        trainMat.append(setOfWords2Vec(myVocabList, postinDoc))
    
    
    
    p0V, p1V, pAb = trainNB0(trainMat, listClasses)
    
    print(p0V)
    print(p1V)
    print(pAb)
    
    testEntry = ['love', 'my','dalmation']
    
    thisDoc = setOfWords2Vec(myVocabList, testEntry)
    
    print(classifyNB(thisDoc, p0V, p1V, pAb))
    
    
    


    来张截图:


  • 相关阅读:
    新四军的7个师,以及粟裕的山头背景
    基于easyui的webform扩展
    Mac入门(一)基本用法
    HtmlAgilityPack实战代码
    摄像头、麦克风、扬声器测试程序
    依赖注入(IOC)
    类型
    C#私房菜[二][提供编程效率的技巧]
    Fluent Nhibernate code frist简单配置
    Ubuntu环境搭建系列—JavaEE篇
  • 原文地址:https://www.cnblogs.com/snake-hand/p/3174413.html
Copyright © 2011-2022 走看看