zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 4 - Practical Data Visualization

    Segment 6 - Creating statistical data graphics

    Statistical Plots Allow Viewers To:

    • Identify outliers
    • Visualize distributions
    • Deduce variable types
    • Discover relationships and core relations between variables in a dataset

    Histograms

    A histogram shows a variable's distribution as a set of adjacent rectangles on a data chart. Histograms represent counts of data within a numerical range of values.

    Scatterplots

    Scatterplots are useful when you want to explore interrelations or dependencies between two different variables. These data graphics are ideal for visually spotting outliers and trends in data.

    Boxplots

    Boxplots are useful for seeing a variable's spread, and for detecting outliers.

    import numpy as np
    import pandas as pd
    from pandas import Series, DataFrame
    
    from pandas.plotting import scatter_matrix
    
    import matplotlib.pyplot as plt
    from pylab import rcParams
    
    %matplotlib inline
    rcParams['figure.figsize'] = 5, 4
    
    import seaborn as sb
    sb.set_style('whitegrid')
    

    Eyeballing dataset distributions with histograms

    address = '~/Data/mtcars.csv'
    
    cars = pd.read_csv(address)
    
    cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']
    cars.index = cars.car_names
    mpg = cars['mpg']
    mpg.plot(kind='hist')
    
    
    <matplotlib.axes._subplots.AxesSubplot at 0x7f637c0199b0>
    

    png

    plt.hist(mpg)
    plt.plot()
    
    []
    

    png

    sb.distplot(mpg)
    
    /home/ericwei/.local/lib/python3.7/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
      warnings.warn(msg, FutureWarning)
    /usr/local/lib/python3.7/dist-packages/matplotlib/cbook/__init__.py:1402: FutureWarning: Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version.  Convert to a numpy array before indexing instead.
      x[:, None]
    /usr/local/lib/python3.7/dist-packages/matplotlib/axes/_base.py:276: FutureWarning: Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version.  Convert to a numpy array before indexing instead.
      x = x[:, np.newaxis]
    /usr/local/lib/python3.7/dist-packages/matplotlib/axes/_base.py:278: FutureWarning: Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version.  Convert to a numpy array before indexing instead.
      y = y[:, np.newaxis]
    
    
    
    
    
    <matplotlib.axes._subplots.AxesSubplot at 0x7f6379883160>
    

    png

    Seeing scatterplots in action

    cars.plot(kind='scatter', x='hp', y='mpg', c=['darkgray'],s=150)
    
    <matplotlib.axes._subplots.AxesSubplot at 0x7f637771e240>
    

    png

    sb.regplot(x='hp', y='mpg', data=cars, scatter=True)
    
    <matplotlib.axes._subplots.AxesSubplot at 0x7f6377688470>
    

    png

    Generating a scatter plot matrix

    sb.pairplot(cars)
    
    <seaborn.axisgrid.PairGrid at 0x7f6373f31c88>
    

    png

    cars_subset = cars[['mpg','disp','hp','wt']]
    sb.pairplot(cars_subset)
    plt.show()
    

    png

    Building boxplots

    cars.boxplot(column='mpg', by='am')
    cars.boxplot(column='wt', by='am')
    
    /home/ericwei/.local/lib/python3.7/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
      return array(a, dtype, copy=False, order=order)
    /home/ericwei/.local/lib/python3.7/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
      return array(a, dtype, copy=False, order=order)
    
    
    
    
    
    <matplotlib.axes._subplots.AxesSubplot at 0x7f636875dc88>
    

    png

    png

    sb.boxplot(x='am', y='mpg', data=cars, palette='hls')
    
    <matplotlib.axes._subplots.AxesSubplot at 0x7f636836ae48>
    

    png

  • 相关阅读:
    Luogu P1273 有限电视网【树形Dp/树形背包】
    Luogu P1160队列安排【链表/老文搬家】By cellur925
    Luogu P1970 花匠 【线性Dp】 By cellur925
    Luogu P1541 乌龟棋 【线性dp】
    P2885 [USACO07NOV]电话线Telephone Wire——Chemist
    Luogu P3916 图的遍历 【优雅的dfs】【内有待填坑】By cellur925
    状压dp之二之三 炮兵阵地/玉米田 By cellur925
    Luogu P1991 无线通讯网 【最小生成树】
    浅谈并查集 By cellur925【内含题目食物链、银河英雄传说等】
    Luogu P1134 阶乘问题 【数学/乱搞】 By cellur925
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14248546.html
Copyright © 2011-2022 走看看