zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 6 - Other Popular Machine Learning Methods

    Segment 5 - Naive Bayes Classifiers

    Naive Bayes Classifiers

    Naive Bayes is a machine learning method you can use to predict the likelihood that an event will occur given evidence that's present in your data.

    Conditional Probability

    [P(B|A) = frac{P(A and B)}{P(A)} ]

    Tree Types of Naive Bayes Model

    • Multinomial
    • Bernoulli
    • Gaussian

    Naive Bayes Use Cases

    • Spam Detection
    • Customer Classification
    • Credit Risk Protection
    • Health Risk Protection

    Naive Bayes Assumptions

    Predictors are independent of each other.

    A proiri assumption: the assumption the past conditions still hold true; when we make predictions from historical values we will get incorrect results if present circumstances have changed.

    • All regression models maintain a priori assumption as well
    import numpy as np
    import pandas as pd
    import urllib
    import sklearn
    
    from sklearn.model_selection import train_test_split
    from sklearn import metrics
    from sklearn.metrics import accuracy_score
    
    from sklearn.naive_bayes import BernoulliNB
    from sklearn.naive_bayes import GaussianNB
    from sklearn.naive_bayes import MultinomialNB
    

    Naive Bayes

    Using Naive Bayes to predict spam

    url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
    
    import urllib.request
    
    raw_data = urllib.request.urlopen(url)
    dataset = np.loadtxt(raw_data, delimiter=',')
    print(dataset[0])
    
    [  0.      0.64    0.64    0.      0.32    0.      0.      0.      0.
       0.      0.      0.64    0.      0.      0.      0.32    0.      1.29
       1.93    0.      0.96    0.      0.      0.      0.      0.      0.
       0.      0.      0.      0.      0.      0.      0.      0.      0.
       0.      0.      0.      0.      0.      0.      0.      0.      0.
       0.      0.      0.      0.      0.      0.      0.778   0.      0.
       3.756  61.    278.      1.   ]
    
    X = dataset[:,0:48]
    
    y = dataset[:,-1]
    
    X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.2, random_state=17)
    
    BernNB = BernoulliNB(binarize=True)
    BernNB.fit(X_train, y_train)
    print(BernNB)
    
    y_expect = y_test
    y_pred = BernNB.predict(X_test)
    
    print(accuracy_score(y_expect, y_pred))
    
    BernoulliNB(binarize=True)
    0.8577633007600435
    
    MultiNB = MultinomialNB()
    MultiNB.fit(X_train, y_train)
    print(MultiNB)
    
    y_pred = MultiNB.predict(X_test)
    
    print(accuracy_score(y_expect, y_pred))
    
    MultinomialNB()
    0.8816503800217155
    
    GausNB = GaussianNB()
    GausNB.fit(X_train, y_train)
    print(GausNB)
    
    y_pred = GausNB.predict(X_test)
    
    print(accuracy_score(y_expect, y_pred))
    
    GaussianNB()
    0.8197611292073833
    
    BernNB = BernoulliNB(binarize=0.1)
    BernNB.fit(X_train, y_train)
    print(BernNB)
    
    y_expect = y_test
    y_pred = BernNB.predict(X_test)
    
    print(accuracy_score(y_expect, y_pred))
    
    BernoulliNB(binarize=0.1)
    0.9109663409337676
    相信未来 - 该面对的绝不逃避,该执著的永不怨悔,该舍弃的不再留念,该珍惜的好好把握。
  • 相关阅读:
    动手做第一个Chrome插件
    Discuz NT 架构剖析之Config机制
    用游标实现查询当前服务器所有数据库所有表的SQL
    Discuz X3.2 网站快照被劫持的解决方法
    centos下MYSQL 没有ROOT用户的解决方法。
    redis命令1
    在当今快节奏的软件更迭当中,我们是否还需要进行系统的学习?
    StructureMap 代码分析之Widget 之Registry 分析 (1)
    C#面试题汇总(未完成)
    C#:解决WCF中服务引用 自动生成代码不全的问题。
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14349367.html
Copyright © 2011-2022 走看看