zoukankan      html  css  js  c++  java
  • 时间序列

    参考博客python实现时间序列分析-ning-ML

    时间序列的测试

    import numpy as np
    import matplotlib.pyplot as plt
    
    output = np.array([
        97, 130, 156.5, 135.2, 137.7,
        180.5, 205.2, 190, 188.6, 196.7,
        180.3, 210.8, 196, 223, 238.2,
        263.5, 292.6, 317, 335.4, 327,
        321.9, 353.5, 397.8, 436.8, 465.7,
        476.7, 462.6, 460.8, 501.8, 501.5, 
        489.5, 542.3, 512.2, 559.8, 542.0, 
        567.0
    ], dtype=float)
    year = np.arange(1964, 2000)
    

    平稳性检验

    时序图检验

    fig, ax = plt.subplots()
    ax.plot(year, output, "+-")
    plt.show()
    

    在这里插入图片描述

    该序列具有明显的趋势性,所以不是通常的平稳序列

    自相关图检验

    def acf(data, lag=None):
        n = len(data)
        if lag is None:
            lag = int(n / 2)
        cov = []
        sample_mean = np.mean(data)
        var = np.sum((data - sample_mean) ** 2) / (n-1)
        cov.append(var)
        for i in range(1, lag):
            x = data[:n-i] - sample_mean
            y = data[i:] - sample_mean
            cov.append(np.mean(x * y))
        cov = np.array(cov, dtype=float)
        acf = cov / var
        return acf, np.arange(len(acf))
    
    acf_inf, lags = acf(output, lag=23)
    print(acf_inf, lags)
    fig, ax = plt.subplots()
    ax.scatter(lags, acf_inf)
    
    [ 1.          0.91392186  0.86822948  0.81369058  0.76064724  0.68729172
      0.63440934  0.57485247  0.49429248  0.41601171  0.32584615  0.20512486
      0.08432045 -0.0501945  -0.17245647 -0.28041079 -0.37630504 -0.4783755
     -0.59591061 -0.71243415 -0.83519788 -0.97330822 -1.08197993] [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22]
    
    
    
    
    
    <matplotlib.collections.PathCollection at 0x274e3b127b8>
    

    在这里插入图片描述

    比较奇怪的是,和书上的怎么不一样,而且acf绝对值不应该小于1?哪里算错了?我知道了,原来算法都是用:

    [hat{ ho}_k approx frac{sum_{t=1}^{n-k}(x_t - ar{x})(x_{t+k}-ar{x})}{sum_{t=1}^n (x_t - ar{x})^2} ]

    算的,而不是:

    [hat{ ho}_k = frac{hat{gamma}(k)}{hat{gamma}(0)} ]

    def acf(data, lag=None):
        n = len(data)
        if lag is None:
            lag = int(n / 2)
        cov = []
        sample_mean = np.mean(data)
        var = np.sum((data - sample_mean) ** 2)
        cov.append(1)
        for i in range(1, lag):
            x = data[:n-i] - sample_mean
            y = data[i:] - sample_mean
            cov.append(np.sum(x * y) / var)
        cov = np.array(cov, dtype=float)
        acf = cov
        return acf, np.arange(len(acf))
    
    acf_inf, lags = acf(output, lag=23)
    print(acf_inf, lags)
    fig, ax = plt.subplots()
    ax.scatter(lags, acf_inf)
    
    [ 1.          0.91392186  0.84342292  0.76719398  0.69544891  0.6087441
      0.54377943  0.47630634  0.39543399  0.32092332  0.24205714  0.14651776
      0.05781973 -0.03298495 -0.10840121 -0.16824648 -0.21503145 -0.25968956
     -0.30646831 -0.34603944 -0.38180474 -0.41713209 -0.43279197] [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22]
    
    
    
    
    
    <matplotlib.collections.PathCollection at 0x274e44b0710>
    

    在这里插入图片描述

    结论就是,自相关图显示出明显的三角对称性, 这时具有单调趋势的非平稳序列的一种典型的自相关图形式.

    随机性检验

    跳过了

    ARMA模型

    AR模型

    AR模型的自相关系数有俩个显著的性质: 1.拖尾性;2.指数衰减

    滞后(k)阶的自相关系数的通解为:

    [ ho_k = sum_{i=1}^p c_i lambda_i^k ]

    其中(|lambda_i| < 1)为差分方程的特征根, (c_1, ldots, c_p)为常数,且不全为0

    通过这个通解形式,容易推出( ho_k)始终有非零取值,不会在(k)大于某个常数之后就恒等于零,这个性质就是拖尾性.

    而以指数衰减的性质就是利用自相关图判断平稳序列时所说的"短期相关"性质.

    AR(p)模型的偏自相关系数具有(p)阶截尾性,利用线性方程组的理论可以证明.事实上,这也是一种确定阶数的方法.另外偏自相关系数可以通过求解Yule-Walker方程获得:

    [left ( egin{array}{cccc} 1 & ho_1 & cdots & ho_{k-1} \ ho_1 & 1 & cdots & ho_{k-2} \ vdots & vdots & & vdots \ ho_{k-1} & ho_{k-2} & cdots & 1 end{array} ight ) left ( egin{array}{c} phi_{k1} \ phi_{k2} \ vdots \ phi_{kk} end{array} ight )= left ( egin{array}{c} ho_1 \ ho_2 \ vdots \ ho_k end{array} ight ) ]

    def pcf(data, lag=None):
        """计算偏自相关系数"""
        n = len(data)
        if lag == None:
            lag = int(n / 2)
        acf_inf, lags = acf(data, lag)
        M = np.zeros((lag, lag), dtype=float)
        for i in range(lag):
            for j in range(i, lag):
                index = np.abs(i - j)
                M[i, j] = acf_inf[index]
                M[j, i] = acf_inf[index]
        return np.linalg.inv(M) @ acf_inf, lags
    pcf_inf, lags = pcf(output, lag=30)
    print(pcf_inf)
    fig, ax = plt.subplots()
    ax.scatter(lags, pcf_inf)
    
    [ 1.00000000e+00  5.32907052e-15 -1.77635684e-15  8.88178420e-16
      0.00000000e+00 -1.33226763e-15 -8.88178420e-16 -1.33226763e-15
      4.44089210e-15 -2.22044605e-15 -1.33226763e-15  2.22044605e-15
     -5.55111512e-16  7.77156117e-16  2.22044605e-16  7.77156117e-16
      8.88178420e-16 -2.22044605e-16 -1.77635684e-15  1.77635684e-15
      0.00000000e+00  2.22044605e-15 -4.44089210e-16  4.44089210e-16
     -8.88178420e-16 -1.33226763e-15  8.88178420e-16  2.66453526e-15
     -8.88178420e-16 -2.22044605e-16]
    
    
    
    
    
    <matplotlib.collections.PathCollection at 0x274e4518be0>
    

    在这里插入图片描述

    是不是又哪里搞错了,和库里的又不一样了.

    MA模型

    MA(q)模型自相关系数(q)阶截尾,即(q)阶以后自相关系数为0

    MA(q)模型偏自相关系数拖尾

    ARMA模型

    ARMA(p, q)模型自相关系数不截尾,而且偏自相关系数也不截尾

    利用已有的库进行时间序列分析

    import pandas as pd
    from random import randrange
    from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
    from statsmodels.tsa.arima_model import ARIMA
    from statsmodels.api import tsa
    import warnings
    warnings.filterwarnings("ignore")
    
    data = pd.DataFrame(output, columns=["output"], 
                       index = pd.Index(tsa.datetools.dates_from_range("1964", "1999")))
    
    data
    
    output
    1964-12-31 97.0
    1965-12-31 130.0
    1966-12-31 156.5
    1967-12-31 135.2
    1968-12-31 137.7
    1969-12-31 180.5
    1970-12-31 205.2
    1971-12-31 190.0
    1972-12-31 188.6
    1973-12-31 196.7
    1974-12-31 180.3
    1975-12-31 210.8
    1976-12-31 196.0
    1977-12-31 223.0
    1978-12-31 238.2
    1979-12-31 263.5
    1980-12-31 292.6
    1981-12-31 317.0
    1982-12-31 335.4
    1983-12-31 327.0
    1984-12-31 321.9
    1985-12-31 353.5
    1986-12-31 397.8
    1987-12-31 436.8
    1988-12-31 465.7
    1989-12-31 476.7
    1990-12-31 462.6
    1991-12-31 460.8
    1992-12-31 501.8
    1993-12-31 501.5
    1994-12-31 489.5
    1995-12-31 542.3
    1996-12-31 512.2
    1997-12-31 559.8
    1998-12-31 542.0
    1999-12-31 567.0
    data.plot(); #时序图
    plot_acf(data).show(); #自相关图
    plot_pacf(data).show(); #偏自相关图
    

    在这里插入图片描述

    在这里插入图片描述
    在这里插入图片描述

    差分运算

    data_diff = data.diff(1).dropna()
    plot_acf(data_diff).show()
    plot_pacf(data_diff).show()
    

    在这里插入图片描述
    在这里插入图片描述

    就用个ARMA(1, 1, 4)吧

    result = ARIMA(data, order=(0, 1, 4)).fit(disp=False)
    fitvalues = result.fittedvalues
    fig, ax = plt.subplots()
    ax.plot(data_diff, label="known")
    ax.plot(fitvalues, label="fitted")
    plt.title('ARIMA RSS: %.4f' % sum(result.fittedvalues - data_diff['output']) ** 2)
    ax.legend()
    plt.show()
    

    在这里插入图片描述

    利用summary查看

    result.summary()
    
    ARIMA Model Results
    Dep. Variable: D.output No. Observations: 35
    Model: ARIMA(0, 1, 4) Log Likelihood -156.722
    Method: css-mle S.D. of innovations 20.534
    Date: Thu, 13 Jun 2019 AIC 325.444
    Time: 18:06:52 BIC 334.776
    Sample: 12-31-1965 HQIC 328.666
    - 12-31-1999
    coef std err z P>|z| [0.025 0.975]
    const 13.9682 0.726 19.227 0.000 12.544 15.392
    ma.L1.D.output -0.3682 0.200 -1.840 0.076 -0.761 0.024
    ma.L2.D.output -0.1066 0.182 -0.585 0.563 -0.463 0.250
    ma.L3.D.output -0.3034 0.196 -1.545 0.133 -0.688 0.081
    ma.L4.D.output -0.2218 0.176 -1.262 0.217 -0.566 0.123
    Roots
    Real Imaginary Modulus Frequency
    MA.1 1.0000 -0.0000j 1.0000 -0.0000
    MA.2 -0.1585 -1.4742j 1.4827 -0.2670
    MA.3 -0.1585 +1.4742j 1.4827 0.2670
    MA.4 -2.0510 -0.0000j 2.0510 -0.5000

    其中的ma.L1.D.output 表示模型的MA部分的第一个参数,因为我们的AR部分为0,如果存在的话也有ar.L1.D.output的

    (P > |z|)表示t检验,这里好像检验没通过,我也不知道咋怎.

    LB统计量检验

    resid = result.resid
    r, q, p = tsa.acf(resid.values.squeeze(), qstat=True)
    print(len(r), len(q), len(p))
    test_data = np.c_[range(1, len(r)), r[1:], q, p]
    table = pd.DataFrame(test_data, columns=['lag', 'AC', 'Q', 'Prob(>Q)'])
    print(table.set_index('lag'))
    
    35 34 34
                AC          Q  Prob(>Q)
    lag                                
    1.0  -0.006969   0.001850  0.965696
    2.0  -0.004150   0.002525  0.998738
    3.0   0.062144   0.158812  0.983947
    4.0   0.143394   1.017765  0.907090
    5.0   0.030833   1.058801  0.957685
    6.0  -0.012886   1.066216  0.982977
    7.0   0.082033   1.377449  0.986252
    8.0  -0.040114   1.454627  0.993436
    9.0  -0.058027   1.622335  0.996133
    10.0 -0.053791   1.772216  0.997807
    11.0 -0.124073   2.602859  0.995003
    12.0 -0.086263   3.021837  0.995388
    13.0 -0.119229   3.858617  0.992604
    14.0 -0.190790   6.103343  0.963819
    15.0 -0.116808   6.986800  0.958015
    16.0 -0.015661   7.003517  0.973193
    17.0  0.023369   7.042806  0.982988
    18.0 -0.073408   7.453300  0.985714
    19.0 -0.081616   7.992434  0.986747
    20.0  0.089290   8.680747  0.986319
    21.0 -0.209498  12.740500  0.917441
    22.0  0.180079  15.970870  0.817329
    23.0  0.096448  16.974725  0.810490
    24.0 -0.039265  17.156229  0.841923
    25.0 -0.083454  18.058138  0.839913
    26.0  0.093348  19.311970  0.822992
    27.0  0.067374  20.046753  0.828788
    28.0 -0.078219  21.178618  0.817823
    29.0  0.006681  21.188253  0.852199
    30.0  0.013390  21.234691  0.880406
    31.0  0.025666  21.447964  0.899588
    32.0 -0.009549  21.487322  0.920437
    33.0 -0.019575  21.735429  0.933336
    34.0  0.009294  21.847286  0.946828
    

    (Prob > (Q))大于0.05,所以模型是显著的

    模型预测

    pred = result.predict('1999', '2010', typ='levels')
    print(pred)
    fig, ax = plt.subplots()
    ax.plot(data, label="known")
    ax.plot(pred, label="pred")
    ax.legend()
    plt.show()
    
    1999-12-31    561.711101
    2000-12-31    581.618193
    2001-12-31    597.624198
    2002-12-31    615.014458
    2003-12-31    627.809643
    2004-12-31    641.777833
    2005-12-31    655.746023
    2006-12-31    669.714214
    2007-12-31    683.682404
    2008-12-31    697.650595
    2009-12-31    711.618785
    2010-12-31    725.586976
    Freq: A-DEC, dtype: float64
    

    在这里插入图片描述

  • 相关阅读:
    Discuz X 2.5 点点(伪静态)
    jq 、xml 省市级联动
    php memcache 初级使用(2)
    关于windows虚拟内存管理的页目录自映射
    SharePoint 2010 网络上的开发经验和资源
    SharePoint 2010 Reporting Services 报表服务器正在内置 NT AUTHORITY\SYSTEM 账户下运行 解决方法
    SharePoint 2010 Reporting Services 报表服务器无法解密用于访问报表服务器数据库中的敏感数据或加密数据的对称密钥 解决方法
    Active Directory Rights Management Services (AD RMS)无法检索证书层次结构。 解决方法
    SharePoint 2010 Reporting Services 报表服务器实例没有正确配置 解决方法
    SharePoint 2010 页面引用 Reporting Services 展现 List 报表
  • 原文地址:https://www.cnblogs.com/MTandHJ/p/11018437.html
Copyright © 2011-2022 走看看