zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 5 - Basic Math and Statistics

    Segment 5 - Starting with parametric methods in pandas and scipy

    import pandas as pd
    import numpy as np
    
    import matplotlib.pyplot as plt
    import seaborn as sb
    from pylab import rcParams
    
    import scipy
    from scipy.stats.stats import pearsonr
    
    %matplotlib inline
    rcParams['figure.figsize'] = 8,4
    plt.style.use('seaborn-whitegrid')
    

    The Pearson Correlation

    address = '~/Data/mtcars.csv'
    
    cars = pd.read_csv(address)
    cars.columns = ['car_names','mpg','cyl','disp','hp','drat','wt','qsec','vs','am','gear','carb']
    
    sb.pairplot(cars)
    
    <seaborn.axisgrid.PairGrid at 0x7ff9164e46d8>
    

    output_5_1__

    X = cars[['mpg','hp','qsec','wt']]
    sb.pairplot(X)
    
    <seaborn.axisgrid.PairGrid at 0x7ff91133c438>
    

    output_6_1__

    Using scipy to calculate the Pearson correlation coefficient

    mpg = cars['mpg']
    hp = cars['hp']
    qsec = cars['qsec']
    wt = cars['wt']
    
    pearsonr_coefficient, p_value = pearsonr(mpg, hp)
    print('PeasonR Correlation Coefficient %0.3f'%(pearsonr_coefficient))
    
    PeasonR Correlation Coefficient -0.776
    
    pearsonr_coefficient, p_value = pearsonr(mpg, qsec)
    print('PeasonR Correlation Coefficient %0.3f'%(pearsonr_coefficient))
    
    PeasonR Correlation Coefficient 0.419
    
    pearsonr_coefficient, p_value = pearsonr(mpg, wt)
    print('PeasonR Correlation Coefficient %0.3f'%(pearsonr_coefficient))
    
    PeasonR Correlation Coefficient -0.868
    

    Using pandas to calculate the Pearson correlation coefficient

    corr = X.corr()
    corr
    
    mpg hp qsec wt
    mpg 1.000000 -0.776168 0.418684 -0.867659
    hp -0.776168 1.000000 -0.708223 0.658748
    qsec 0.418684 -0.708223 1.000000 -0.174716
    wt -0.867659 0.658748 -0.174716 1.000000

    Using Seaborn to visualize the Pearson correlation coefficient

    sb.heatmap(corr, xticklabels=corr.columns.values, yticklabels=corr.columns.values)
    
    <matplotlib.axes._subplots.AxesSubplot at 0x7ff90c978358>
    

    png

  • 相关阅读:
    HTML图片映射
    js数组去重问题
    Mooc--五子棋(js)小结
    js跨域问题
    HTML5新特性
    js输出
    CSS display属性学习
    理财课堂日记第1天
    磁盘性能测试方法
    一个清空数据库数据的脚本总结
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14285185.html
Copyright © 2011-2022 走看看