zoukankan      html  css  js  c++  java
  • 时间序列

    参考博客python实现时间序列分析-ning-ML

    时间序列的测试

    import numpy as np
    import matplotlib.pyplot as plt
    
    output = np.array([
        97, 130, 156.5, 135.2, 137.7,
        180.5, 205.2, 190, 188.6, 196.7,
        180.3, 210.8, 196, 223, 238.2,
        263.5, 292.6, 317, 335.4, 327,
        321.9, 353.5, 397.8, 436.8, 465.7,
        476.7, 462.6, 460.8, 501.8, 501.5, 
        489.5, 542.3, 512.2, 559.8, 542.0, 
        567.0
    ], dtype=float)
    year = np.arange(1964, 2000)
    

    平稳性检验

    时序图检验

    fig, ax = plt.subplots()
    ax.plot(year, output, "+-")
    plt.show()
    

    在这里插入图片描述

    该序列具有明显的趋势性,所以不是通常的平稳序列

    自相关图检验

    def acf(data, lag=None):
        n = len(data)
        if lag is None:
            lag = int(n / 2)
        cov = []
        sample_mean = np.mean(data)
        var = np.sum((data - sample_mean) ** 2) / (n-1)
        cov.append(var)
        for i in range(1, lag):
            x = data[:n-i] - sample_mean
            y = data[i:] - sample_mean
            cov.append(np.mean(x * y))
        cov = np.array(cov, dtype=float)
        acf = cov / var
        return acf, np.arange(len(acf))
    
    acf_inf, lags = acf(output, lag=23)
    print(acf_inf, lags)
    fig, ax = plt.subplots()
    ax.scatter(lags, acf_inf)
    
    [ 1.          0.91392186  0.86822948  0.81369058  0.76064724  0.68729172
      0.63440934  0.57485247  0.49429248  0.41601171  0.32584615  0.20512486
      0.08432045 -0.0501945  -0.17245647 -0.28041079 -0.37630504 -0.4783755
     -0.59591061 -0.71243415 -0.83519788 -0.97330822 -1.08197993] [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22]
    
    
    
    
    
    <matplotlib.collections.PathCollection at 0x274e3b127b8>
    

    在这里插入图片描述

    比较奇怪的是,和书上的怎么不一样,而且acf绝对值不应该小于1?哪里算错了?我知道了,原来算法都是用:

    [hat{ ho}_k approx frac{sum_{t=1}^{n-k}(x_t - ar{x})(x_{t+k}-ar{x})}{sum_{t=1}^n (x_t - ar{x})^2} ]

    算的,而不是:

    [hat{ ho}_k = frac{hat{gamma}(k)}{hat{gamma}(0)} ]

    def acf(data, lag=None):
        n = len(data)
        if lag is None:
            lag = int(n / 2)
        cov = []
        sample_mean = np.mean(data)
        var = np.sum((data - sample_mean) ** 2)
        cov.append(1)
        for i in range(1, lag):
            x = data[:n-i] - sample_mean
            y = data[i:] - sample_mean
            cov.append(np.sum(x * y) / var)
        cov = np.array(cov, dtype=float)
        acf = cov
        return acf, np.arange(len(acf))
    
    acf_inf, lags = acf(output, lag=23)
    print(acf_inf, lags)
    fig, ax = plt.subplots()
    ax.scatter(lags, acf_inf)
    
    [ 1.          0.91392186  0.84342292  0.76719398  0.69544891  0.6087441
      0.54377943  0.47630634  0.39543399  0.32092332  0.24205714  0.14651776
      0.05781973 -0.03298495 -0.10840121 -0.16824648 -0.21503145 -0.25968956
     -0.30646831 -0.34603944 -0.38180474 -0.41713209 -0.43279197] [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22]
    
    
    
    
    
    <matplotlib.collections.PathCollection at 0x274e44b0710>
    

    在这里插入图片描述

    结论就是,自相关图显示出明显的三角对称性, 这时具有单调趋势的非平稳序列的一种典型的自相关图形式.

    随机性检验

    跳过了

    ARMA模型

    AR模型

    AR模型的自相关系数有俩个显著的性质: 1.拖尾性;2.指数衰减

    滞后(k)阶的自相关系数的通解为:

    [ ho_k = sum_{i=1}^p c_i lambda_i^k ]

    其中(|lambda_i| < 1)为差分方程的特征根, (c_1, ldots, c_p)为常数,且不全为0

    通过这个通解形式,容易推出( ho_k)始终有非零取值,不会在(k)大于某个常数之后就恒等于零,这个性质就是拖尾性.

    而以指数衰减的性质就是利用自相关图判断平稳序列时所说的"短期相关"性质.

    AR(p)模型的偏自相关系数具有(p)阶截尾性,利用线性方程组的理论可以证明.事实上,这也是一种确定阶数的方法.另外偏自相关系数可以通过求解Yule-Walker方程获得:

    [left ( egin{array}{cccc} 1 & ho_1 & cdots & ho_{k-1} \ ho_1 & 1 & cdots & ho_{k-2} \ vdots & vdots & & vdots \ ho_{k-1} & ho_{k-2} & cdots & 1 end{array} ight ) left ( egin{array}{c} phi_{k1} \ phi_{k2} \ vdots \ phi_{kk} end{array} ight )= left ( egin{array}{c} ho_1 \ ho_2 \ vdots \ ho_k end{array} ight ) ]

    def pcf(data, lag=None):
        """计算偏自相关系数"""
        n = len(data)
        if lag == None:
            lag = int(n / 2)
        acf_inf, lags = acf(data, lag)
        M = np.zeros((lag, lag), dtype=float)
        for i in range(lag):
            for j in range(i, lag):
                index = np.abs(i - j)
                M[i, j] = acf_inf[index]
                M[j, i] = acf_inf[index]
        return np.linalg.inv(M) @ acf_inf, lags
    pcf_inf, lags = pcf(output, lag=30)
    print(pcf_inf)
    fig, ax = plt.subplots()
    ax.scatter(lags, pcf_inf)
    
    [ 1.00000000e+00  5.32907052e-15 -1.77635684e-15  8.88178420e-16
      0.00000000e+00 -1.33226763e-15 -8.88178420e-16 -1.33226763e-15
      4.44089210e-15 -2.22044605e-15 -1.33226763e-15  2.22044605e-15
     -5.55111512e-16  7.77156117e-16  2.22044605e-16  7.77156117e-16
      8.88178420e-16 -2.22044605e-16 -1.77635684e-15  1.77635684e-15
      0.00000000e+00  2.22044605e-15 -4.44089210e-16  4.44089210e-16
     -8.88178420e-16 -1.33226763e-15  8.88178420e-16  2.66453526e-15
     -8.88178420e-16 -2.22044605e-16]
    
    
    
    
    
    <matplotlib.collections.PathCollection at 0x274e4518be0>
    

    在这里插入图片描述

    是不是又哪里搞错了,和库里的又不一样了.

    MA模型

    MA(q)模型自相关系数(q)阶截尾,即(q)阶以后自相关系数为0

    MA(q)模型偏自相关系数拖尾

    ARMA模型

    ARMA(p, q)模型自相关系数不截尾,而且偏自相关系数也不截尾

    利用已有的库进行时间序列分析

    import pandas as pd
    from random import randrange
    from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
    from statsmodels.tsa.arima_model import ARIMA
    from statsmodels.api import tsa
    import warnings
    warnings.filterwarnings("ignore")
    
    data = pd.DataFrame(output, columns=["output"], 
                       index = pd.Index(tsa.datetools.dates_from_range("1964", "1999")))
    
    data
    
    output
    1964-12-31 97.0
    1965-12-31 130.0
    1966-12-31 156.5
    1967-12-31 135.2
    1968-12-31 137.7
    1969-12-31 180.5
    1970-12-31 205.2
    1971-12-31 190.0
    1972-12-31 188.6
    1973-12-31 196.7
    1974-12-31 180.3
    1975-12-31 210.8
    1976-12-31 196.0
    1977-12-31 223.0
    1978-12-31 238.2
    1979-12-31 263.5
    1980-12-31 292.6
    1981-12-31 317.0
    1982-12-31 335.4
    1983-12-31 327.0
    1984-12-31 321.9
    1985-12-31 353.5
    1986-12-31 397.8
    1987-12-31 436.8
    1988-12-31 465.7
    1989-12-31 476.7
    1990-12-31 462.6
    1991-12-31 460.8
    1992-12-31 501.8
    1993-12-31 501.5
    1994-12-31 489.5
    1995-12-31 542.3
    1996-12-31 512.2
    1997-12-31 559.8
    1998-12-31 542.0
    1999-12-31 567.0
    data.plot(); #时序图
    plot_acf(data).show(); #自相关图
    plot_pacf(data).show(); #偏自相关图
    

    在这里插入图片描述

    在这里插入图片描述
    在这里插入图片描述

    差分运算

    data_diff = data.diff(1).dropna()
    plot_acf(data_diff).show()
    plot_pacf(data_diff).show()
    

    在这里插入图片描述
    在这里插入图片描述

    就用个ARMA(1, 1, 4)吧

    result = ARIMA(data, order=(0, 1, 4)).fit(disp=False)
    fitvalues = result.fittedvalues
    fig, ax = plt.subplots()
    ax.plot(data_diff, label="known")
    ax.plot(fitvalues, label="fitted")
    plt.title('ARIMA RSS: %.4f' % sum(result.fittedvalues - data_diff['output']) ** 2)
    ax.legend()
    plt.show()
    

    在这里插入图片描述

    利用summary查看

    result.summary()
    
    ARIMA Model Results
    Dep. Variable: D.output No. Observations: 35
    Model: ARIMA(0, 1, 4) Log Likelihood -156.722
    Method: css-mle S.D. of innovations 20.534
    Date: Thu, 13 Jun 2019 AIC 325.444
    Time: 18:06:52 BIC 334.776
    Sample: 12-31-1965 HQIC 328.666
    - 12-31-1999
    coef std err z P>|z| [0.025 0.975]
    const 13.9682 0.726 19.227 0.000 12.544 15.392
    ma.L1.D.output -0.3682 0.200 -1.840 0.076 -0.761 0.024
    ma.L2.D.output -0.1066 0.182 -0.585 0.563 -0.463 0.250
    ma.L3.D.output -0.3034 0.196 -1.545 0.133 -0.688 0.081
    ma.L4.D.output -0.2218 0.176 -1.262 0.217 -0.566 0.123
    Roots
    Real Imaginary Modulus Frequency
    MA.1 1.0000 -0.0000j 1.0000 -0.0000
    MA.2 -0.1585 -1.4742j 1.4827 -0.2670
    MA.3 -0.1585 +1.4742j 1.4827 0.2670
    MA.4 -2.0510 -0.0000j 2.0510 -0.5000

    其中的ma.L1.D.output 表示模型的MA部分的第一个参数,因为我们的AR部分为0,如果存在的话也有ar.L1.D.output的

    (P > |z|)表示t检验,这里好像检验没通过,我也不知道咋怎.

    LB统计量检验

    resid = result.resid
    r, q, p = tsa.acf(resid.values.squeeze(), qstat=True)
    print(len(r), len(q), len(p))
    test_data = np.c_[range(1, len(r)), r[1:], q, p]
    table = pd.DataFrame(test_data, columns=['lag', 'AC', 'Q', 'Prob(>Q)'])
    print(table.set_index('lag'))
    
    35 34 34
                AC          Q  Prob(>Q)
    lag                                
    1.0  -0.006969   0.001850  0.965696
    2.0  -0.004150   0.002525  0.998738
    3.0   0.062144   0.158812  0.983947
    4.0   0.143394   1.017765  0.907090
    5.0   0.030833   1.058801  0.957685
    6.0  -0.012886   1.066216  0.982977
    7.0   0.082033   1.377449  0.986252
    8.0  -0.040114   1.454627  0.993436
    9.0  -0.058027   1.622335  0.996133
    10.0 -0.053791   1.772216  0.997807
    11.0 -0.124073   2.602859  0.995003
    12.0 -0.086263   3.021837  0.995388
    13.0 -0.119229   3.858617  0.992604
    14.0 -0.190790   6.103343  0.963819
    15.0 -0.116808   6.986800  0.958015
    16.0 -0.015661   7.003517  0.973193
    17.0  0.023369   7.042806  0.982988
    18.0 -0.073408   7.453300  0.985714
    19.0 -0.081616   7.992434  0.986747
    20.0  0.089290   8.680747  0.986319
    21.0 -0.209498  12.740500  0.917441
    22.0  0.180079  15.970870  0.817329
    23.0  0.096448  16.974725  0.810490
    24.0 -0.039265  17.156229  0.841923
    25.0 -0.083454  18.058138  0.839913
    26.0  0.093348  19.311970  0.822992
    27.0  0.067374  20.046753  0.828788
    28.0 -0.078219  21.178618  0.817823
    29.0  0.006681  21.188253  0.852199
    30.0  0.013390  21.234691  0.880406
    31.0  0.025666  21.447964  0.899588
    32.0 -0.009549  21.487322  0.920437
    33.0 -0.019575  21.735429  0.933336
    34.0  0.009294  21.847286  0.946828
    

    (Prob > (Q))大于0.05,所以模型是显著的

    模型预测

    pred = result.predict('1999', '2010', typ='levels')
    print(pred)
    fig, ax = plt.subplots()
    ax.plot(data, label="known")
    ax.plot(pred, label="pred")
    ax.legend()
    plt.show()
    
    1999-12-31    561.711101
    2000-12-31    581.618193
    2001-12-31    597.624198
    2002-12-31    615.014458
    2003-12-31    627.809643
    2004-12-31    641.777833
    2005-12-31    655.746023
    2006-12-31    669.714214
    2007-12-31    683.682404
    2008-12-31    697.650595
    2009-12-31    711.618785
    2010-12-31    725.586976
    Freq: A-DEC, dtype: float64
    

    在这里插入图片描述

  • 相关阅读:
    Flask安装教程
    Django进阶(2)
    反编译源码
    Jmeter之Bean shell使用(一)
    接口测试基础(fiddler、postman的使用、python实现测试接口程序)
    Fiddler显示服务器IP的方法
    web测试与app测试的区别
    互联网产品接入支付功能如何测试?
    web测试中的测试点和测试方法总结
    使用JMeter建立接口测试
  • 原文地址:https://www.cnblogs.com/MTandHJ/p/11018437.html
Copyright © 2011-2022 走看看