zoukankan      html  css  js  c++  java
  • Python 多元线性回归

    分析目的

    分析空气中主要污染物浓度与空气指数之间的关系
    

    分析数据

    天气污染物浓度的数据集,该数据集源自天气后报网站上爬取的数据,为北京2013年10月28日到2016年1月31日的空气污染物浓度的数据。包括空气质量等级、AQI指数和当天排名。
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt 
    %matplotlib inline
    import statsmodels.api as sm

    线性回归

    1.数据预处理

    data = pd.read_csv("beijing.csv",index_col = 0)
    data.head()
     AQI指数当天AQI排名PM25PM10So2No2CoO3
    1 306 106 255 277 30 105 2.60 15
    2 62 22 39 62 10 46 0.91 27
    3 99 61 71 101 11 72 1.18 14
    4 176 98 135 162 10 96 1.62 2
    5 231 102 181 202 14 100 1.89 0
    X = data.iloc[:,2:8]
    X = sm.add_constant(X)
    y = data.iloc[:,0]
    print(X.head())
       const  PM25  PM10  So2  No2    Co  O3
    1    1.0   255   277   30  105  2.60  15
    2    1.0    39    62   10   46  0.91  27
    3    1.0    71   101   11   72  1.18  14
    4    1.0   135   162   10   96  1.62   2
    5    1.0   181   202   14  100  1.89   0

    2.建立模型

    model1 = sm.OLS(y,X)  #建立模型
    result = model1.fit() #训练模型
    print(result.summary())
                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:                  AQI指数   R-squared:                       0.963
    Model:                            OLS   Adj. R-squared:                  0.963
    Method:                 Least Squares   F-statistic:                     3549.
    Date:                Thu, 02 Apr 2020   Prob (F-statistic):               0.00
    Time:                        20:43:20   Log-Likelihood:                -3378.3
    No. Observations:                 822   AIC:                             6771.
    Df Residuals:                     815   BIC:                             6804.
    Df Model:                           6                                         
    Covariance Type:            nonrobust                                         
    ==============================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------
    const         26.4656      2.099     12.610      0.000      22.346      30.585
    PM25           0.9506      0.019     50.834      0.000       0.914       0.987
    PM10           0.2412      0.015     15.691      0.000       0.211       0.271
    So2           -0.0212      0.038     -0.555      0.579      -0.096       0.054
    No2           -0.2624      0.047     -5.601      0.000      -0.354      -0.170
    Co            -1.5038      1.109     -1.356      0.175      -3.680       0.672
    O3             0.0468      0.018      2.621      0.009       0.012       0.082
    ==============================================================================
    Omnibus:                      351.197   Durbin-Watson:                   1.782
    Prob(Omnibus):                  0.000   Jarque-Bera (JB):             5876.885
    Skew:                           1.489   Prob(JB):                         0.00
    Kurtosis:                      15.756   Cond. No.                         733.
    ==============================================================================
    
    Warnings:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
    result.f_pvalue #检验线性回归关系显著性
    0.0
    result.params #回归系数
    const    26.465624
    PM25      0.950583
    PM10      0.241180
    So2      -0.021246
    No2      -0.262374
    Co       -1.503839
    O3        0.046783
    dtype: float64

    改进模型

    由于So2与Co的p值大于0.05,所以排除这两个变量,重新建立模型

    data = pd.read_csv("beijing.csv",index_col = 0)
    data.head()
     AQI指数当天AQI排名PM25PM10So2No2CoO3
    1 306 106 255 277 30 105 2.60 15
    2 62 22 39 62 10 46 0.91 27
    3 99 61 71 101 11 72 1.18 14
    4 176 98 135 162 10 96 1.62 2
    5 231 102 181 202 14 100 1.89 0
    X = data.iloc[:,[2,3,5,7]]
    X = sm.add_constant(X)
    y = data.iloc[:,0]
    print(X.head())
       const  PM25  PM10  No2  O3
    1    1.0   255   277  105  15
    2    1.0    39    62   46  27
    3    1.0    71   101   72  14
    4    1.0   135   162   96   2
    5    1.0   181   202  100   0
    
    model2 = sm.OLS(y,X)  #建立模型
    result = model2.fit() #训练模型
    print(result.summary())
                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:                  AQI指数   R-squared:                       0.963
    Model:                            OLS   Adj. R-squared:                  0.963
    Method:                 Least Squares   F-statistic:                     5318.
    Date:                Thu, 02 Apr 2020   Prob (F-statistic):               0.00
    Time:                        21:35:18   Log-Likelihood:                -3379.7
    No. Observations:                 822   AIC:                             6769.
    Df Residuals:                     817   BIC:                             6793.
    Df Model:                           4                                         
    Covariance Type:            nonrobust                                         
    ==============================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------
    const         25.9959      2.064     12.598      0.000      21.945      30.046
    PM25           0.9378      0.016     58.347      0.000       0.906       0.969
    PM10           0.2417      0.015     15.864      0.000       0.212       0.272
    No2           -0.2891      0.044     -6.613      0.000      -0.375      -0.203
    O3             0.0560      0.017      3.297      0.001       0.023       0.089
    ==============================================================================
    Omnibus:                      337.402   Durbin-Watson:                   1.783
    Prob(Omnibus):                  0.000   Jarque-Bera (JB):             5783.530
    Skew:                           1.401   Prob(JB):                         0.00
    Kurtosis:                      15.689   Cond. No.                         711.
    ==============================================================================
    
    Warnings:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
  • 相关阅读:
    pat 乙级1084 外观数列
    将int 转换为string 函数 to_string()
    stl find_first_not_of()函数
    小写转变为大写函数toupper()
    基础实验2-2.3 组合数的和 (15分)
    基础实验2-2.2 求集合数据的均方差 (15分)
    习题1.9 有序数组的插入 (20分)
    用eclipse运行算法第四版的BinarySearch
    关于脱发
    HUD-2586(LCA板子)
  • 原文地址:https://www.cnblogs.com/jiaxinwei/p/12623207.html
Copyright © 2011-2022 走看看