zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 5 - Basic Math and Statistics

    Segment 5 - Starting with parametric methods in pandas and scipy

    import pandas as pd
    import numpy as np
    
    import matplotlib.pyplot as plt
    import seaborn as sb
    from pylab import rcParams
    
    import scipy
    from scipy.stats.stats import pearsonr
    
    %matplotlib inline
    rcParams['figure.figsize'] = 8,4
    plt.style.use('seaborn-whitegrid')
    

    The Pearson Correlation

    address = '~/Data/mtcars.csv'
    
    cars = pd.read_csv(address)
    cars.columns = ['car_names','mpg','cyl','disp','hp','drat','wt','qsec','vs','am','gear','carb']
    
    sb.pairplot(cars)
    
    <seaborn.axisgrid.PairGrid at 0x7ff9164e46d8>
    

    output_5_1__

    X = cars[['mpg','hp','qsec','wt']]
    sb.pairplot(X)
    
    <seaborn.axisgrid.PairGrid at 0x7ff91133c438>
    

    output_6_1__

    Using scipy to calculate the Pearson correlation coefficient

    mpg = cars['mpg']
    hp = cars['hp']
    qsec = cars['qsec']
    wt = cars['wt']
    
    pearsonr_coefficient, p_value = pearsonr(mpg, hp)
    print('PeasonR Correlation Coefficient %0.3f'%(pearsonr_coefficient))
    
    PeasonR Correlation Coefficient -0.776
    
    pearsonr_coefficient, p_value = pearsonr(mpg, qsec)
    print('PeasonR Correlation Coefficient %0.3f'%(pearsonr_coefficient))
    
    PeasonR Correlation Coefficient 0.419
    
    pearsonr_coefficient, p_value = pearsonr(mpg, wt)
    print('PeasonR Correlation Coefficient %0.3f'%(pearsonr_coefficient))
    
    PeasonR Correlation Coefficient -0.868
    

    Using pandas to calculate the Pearson correlation coefficient

    corr = X.corr()
    corr
    
    mpg hp qsec wt
    mpg 1.000000 -0.776168 0.418684 -0.867659
    hp -0.776168 1.000000 -0.708223 0.658748
    qsec 0.418684 -0.708223 1.000000 -0.174716
    wt -0.867659 0.658748 -0.174716 1.000000

    Using Seaborn to visualize the Pearson correlation coefficient

    sb.heatmap(corr, xticklabels=corr.columns.values, yticklabels=corr.columns.values)
    
    <matplotlib.axes._subplots.AxesSubplot at 0x7ff90c978358>
    

    png

  • 相关阅读:
    再谈每周工作不要超过 40 小时
    前苹果雇员潜入总部 只为完成自己的项目
    C语言解析pcap文件得到HTTP信息实例(原创,附源码)
    Android 2.1 源码结构分析
    linux 获取当前日期与时间
    谷歌史上十大优秀产品榜:Android傲娇上位
    在eclipse中查看android SDK的源代码
    网络开发必备的HTTP协议知识
    Linux TCP/IP协议栈源码阅读笔记
    浅析UPnP协议
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14285185.html
Copyright © 2011-2022 走看看