zoukankan      html  css  js  c++  java
  • 时间序列学习笔记3

    4. 时区处理

    时区处理很麻烦,一般就以UTC来处理。
    UTC为协调世界时,是格林尼治时间的替代者,目前已经是国际标准。

    In [1]: import pytz
    
    In [4]: pytz.common_timezones[-5:]
    Out[4]: ['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']
    
    In [5]: tz = pytz.timezone('Asia/Shanghai')
    
    In [6]: tz
    Out[6]: <DstTzInfo 'Asia/Shanghai' LMT+8:06:00 STD>
    

    4.1 本地化和转换

    默认情况下,pandas时间序列是单纯(naive)时区的。

    In [11]: rng = pd.date_range('2/19/2017 9:30', periods=4, freq='D')
    
    In [12]: ts = Series(np.random.randn(4),index=rng)
    
    In [13]: ts.index.tz  # 结果为空
    
    In [14]: ts
    Out[14]:
    2017-02-19 09:30:00    0.530722
    2017-02-20 09:30:00    1.459262
    2017-02-21 09:30:00   -0.038216
    2017-02-22 09:30:00   -0.671159
    Freq: D, dtype: float64
    
    
    
    # 可以在创建的时候直接赋值 tz=?
    In [15]: pd.date_range('2/19/2017 9:30', periods=4, freq='D', tz='UTC')
    Out[15]:
    DatetimeIndex(['2017-02-19 09:30:00+00:00', '2017-02-20 09:30:00+00:00',
                   '2017-02-21 09:30:00+00:00', '2017-02-22 09:30:00+00:00'],
                  dtype='datetime64[ns, UTC]', freq='D')
    
    # 从naive到有时区,使用tz_localize
    In [16]: tz_utc = ts.tz_localize('UTC')
    
    In [17]: tz_utc
    Out[17]:
    2017-02-19 09:30:00+00:00    0.530722
    2017-02-20 09:30:00+00:00    1.459262
    2017-02-21 09:30:00+00:00   -0.038216
    2017-02-22 09:30:00+00:00   -0.671159
    Freq: D, dtype: float64
    
    In [18]: tz_utc.index.tz
    Out[18]: <UTC>
    
    # 使用 tz_convert进行修改时区
    In [20]: tz_utc.tz_convert('Asia/Shanghai')
    Out[20]:
    2017-02-19 17:30:00+08:00    0.530722
    2017-02-20 17:30:00+08:00    1.459262
    2017-02-21 17:30:00+08:00   -0.038216
    2017-02-22 17:30:00+08:00   -0.671159
    Freq: D, dtype: float64
    
    
    
    

    4.2 Timestamp对象

    # 创建一个Timestamp对象
    In [25]: stamp = pd.Timestamp('2017-2-19 12:10')
    
    # naive to utc
    In [26]: stamp_utc = stamp.tz_localize('UTC')
    
    # 转换
    In [29]: stamp_cn = stamp_utc.tz_convert('Asia/Shanghai')
    
    
    
    #  value 显示从unix纪元(1970.1.1)开始计算的纳秒数
    In [30]: stamp_utc.value
    Out[30]: 1487506200000000000
    
    In [31]: stamp_cn.value
    Out[31]: 1487506200000000000
    
    In [32]: stamp.value  # 三个都是一样的
    Out[32]: 1487506200000000000
    
    
    
    

    4.3 不同时区之间的运算

    不同时区之间的运算最终都转换成了UTC,因为实际存储中都是以UTC时区来存储的。

    In [33]: ts
    Out[33]:
    2017-02-19 09:30:00    0.530722
    2017-02-20 09:30:00    1.459262
    2017-02-21 09:30:00   -0.038216
    2017-02-22 09:30:00   -0.671159
    Freq: D, dtype: float64
    
    In [34]: ts.index
    Out[34]:
    DatetimeIndex(['2017-02-19 09:30:00', '2017-02-20 09:30:00',
                   '2017-02-21 09:30:00', '2017-02-22 09:30:00'],
                  dtype='datetime64[ns]', freq='D')
    
    In [35]: ts1 = ts[:2].tz_localize('Europe/London')  
    
    In [36]: ts2 = ts1.tz_convert('Europe/Moscow')
    
    In [37]: result = ts1 + ts2  # ts1和ts2在不同的时区
    
    In [38]: result.index  # 结果都转变为了UTC
    Out[38]: DatetimeIndex(['2017-02-19 09:30:00+00:00', '2017-02-20 09:30:00+00:00'], dtype='datetime64[ns, UTC]', freq='D')
    
    In [39]: result
    Out[39]:
    2017-02-19 09:30:00+00:00    1.061445
    2017-02-20 09:30:00+00:00    2.918524
    Freq: D, dtype: float64
    
    

    5. 时期及算术运算

    period(时期)表示时间区间,如数日、数月等。

    In [4]: p = pd.Period(2017)
    
    In [5]: p
    Out[5]: Period('2017', 'A-DEC')
    
    In [6]: p + 1
    Out[6]: Period('2018', 'A-DEC')
    
    In [7]: pd.Period(2018) - p
    Out[7]: 1
    
    In [8]: rng = pd.period_range('1/1/2001','6/30/2001', freq='M')
    
    In [9]: rng
    Out[9]: PeriodIndex(['2001-01', '2001-02', '2001-03', '2001-04', '2001-05', '2001-06'], dtype='int64', freq='M')
    
    In [10]: Series(np.random.randn(6), index=rng)
    Out[10]:
    2001-01    1.146489
    2001-02    2.112800
    2001-03    0.292746
    2001-04   -0.841383
    2001-05   -0.845565
    2001-06    1.207504
    Freq: M, dtype: float64
    
    
    # 列表
    In [11]: values = ['2001Q3','2002Q2','2003Q1']
    
    In [13]: index = pd.PeriodIndex(values, freq='Q-DEC') # 以DEC月份作为年度最后一天,来计算季度
    
    In [14]: index
    Out[14]: PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='int64', freq='Q-DEC')
    
    In [26]: index.asfreq('Q-JUN') # 修改一下
    Out[26]: PeriodIndex(['2002Q1', '2002Q4', '2003Q3'], dtype='int64', freq='Q-JUN')
    

    5.1 period的频率转换

    In [15]: p
    Out[15]: Period('2017', 'A-DEC') # 按年取,取一年,年尾是12年31日
    
    In [16]: p.asfreq('M', how='start')  #
    Out[16]: Period('2017-01', 'M')
    
    In [17]: p.asfreq('M', how='end')
    Out[17]: Period('2017-12', 'M')
    
    In [18]: p = pd.Period('2017',freq='A-JUN') # 取2017年,以7月底为年终
    
    In [19]: p.asfreq('M',how='end')
    Out[19]: Period('2017-06', 'M')
    
    In [20]: rng = pd.period_range('2006','2009',freq='A-DEC')  # 取6-9的每年
    
    In [21]: ts = Series(np.random.randn(len(rng)), index=rng)
    
    In [22]: ts
    Out[22]:
    2006   -0.627032
    2007   -1.409714
    2008    0.072737
    2009    1.240899
    Freq: A-DEC, dtype: float64
    
    In [23]: ts.asfreq('M', how='start')  # 按月取,取第一个月
    Out[23]:
    2006-01   -0.627032
    2007-01   -1.409714
    2008-01    0.072737
    2009-01    1.240899
    Freq: M, dtype: float64
    
    In [24]: ts.asfreq('B', how='end')  # 修改频率到天,并取最后一天
    Out[24]:
    2006-12-29   -0.627032
    2007-12-31   -1.409714
    2008-12-31    0.072737
    2009-12-31    1.240899
    Freq: B, dtype: float64
    

    5.2 按季度计算的时期频率

    In [28]: rng = pd.period_range('2011Q3','2012Q4',freq='Q-JAN')
    
    In [29]: rs = Series(np.arange(len(rng)), index=rng)
    
    In [30]: new_rng = (rng.asfreq('B','e') - 1).asfreq('T','s') + 16*60
    
    In [35]: rs.index = new_rng.to_timestamp()
    
    In [36]: rs
    Out[36]:
    2010-10-28 16:00:00    0
    2011-01-28 16:00:00    1
    2011-04-28 16:00:00    2
    2011-07-28 16:00:00    3
    2011-10-28 16:00:00    4
    2012-01-30 16:00:00    5
    dtype: int64
    
    

    5.3 将timestamp和period进行转换

    In [38]: rng = pd.date_range('1/1/2001', periods=3, freq='M')
    
    In [40]: ts = Series(np.random.randn(3), index=rng)
    
    In [41]: pts = ts.to_period()  # 转换成时期
    
    In [42]: ts
    Out[42]:
    2001-01-31    0.619856
    2001-02-28   -2.117066
    2001-03-31    1.152329
    Freq: M, dtype: float64
    
    In [43]: pts
    Out[43]:
    2001-01    0.619856
    2001-02   -2.117066
    2001-03    1.152329
    Freq: M, dtype: float64
    
    
    In [45]: pts.to_timestamp(how='end')  # 转换成时间戳
    Out[45]:
    2001-01-31    0.619856
    2001-02-28   -2.117066
    2001-03-31    1.152329
    Freq: M, dtype: float64
    

    5.4 通过数据创建PeriodIndex

    In [47]: q = Series(range(1,5) * 7)  # 创建季度
    
    In [48]: y = Series(np.arange(1988,2016))  # 创建年份
    
    In [49]: index = pd.PeriodIndex(year=y,quarter=q, freq='Q-DEC')  # 创建index
    
    In [50]: data = Series(np.random.randn(28), index=index)
    
    In [51]: data
    Out[51]:
    1988Q1   -0.127187
    1989Q2   -1.757196
    1990Q3    0.826757
    ...
    2013Q2    0.540955
    2014Q3    0.531101
    2015Q4    0.751739
    Freq: Q-DEC, dtype: float64
    

    待续。。。

  • 相关阅读:
    创业公司的经济适用架构师
    软件工程–从嗤之以鼻到视若法宝
    阿里云CDN+OSS完成图片加速
    听说你在为天天写业务代码而烦恼?
    从实践者的角度看软件架构的历史
    KVM虚拟化技术
    网络基础和 TCP、IP 协议
    分布式应用程序协调服务 ZooKeeper
    python 装饰器
    python 柯里化**
  • 原文地址:https://www.cnblogs.com/felo/p/6421795.html
Copyright © 2011-2022 走看看