zoukankan      html  css  js  c++  java
  • pandas dataframe按时间连续性分块

    当时序数据不连续时,需要将连续的数据划分为一块,基于pandas dataframe的方案如下。

    >>> df
      DateAnalyzed       Val
    1   2018-03-18  0.470253
    2   2018-03-19  0.470253
    3   2018-03-20  0.470253
    4   2017-01-20  0.485949  # < watch out for this
    5   2018-09-25  0.467729
    6   2018-09-26  0.467729
    7   2018-09-27  0.467729
    
    >>> df.dtypes
    DateAnalyzed    datetime64[ns]
    Val                    float64
    dtype: object
    
    
    
    >>> dt = df['DateAnalyzed']
    >>> day = pd.Timedelta('1d')
    >>> in_block = ((dt - dt.shift(-1)).abs() == day) | (dt.diff() == day)
    >>> in_block
    1     True
    2     True
    3     True
    4    False
    5     True
    6     True
    7     True
    Name: DateAnalyzed, dtype: bool

    >>> filt = df.loc[in_block] >>> breaks = filt['DateAnalyzed'].diff() != day >>> groups = breaks.cumsum() >>> groups 1 1 2 1 3 1 5 2 6 2 7 2 Name: DateAnalyzed, dtype: int64 >>> for _, frame in filt.groupby(groups): ... print(frame, end=' ') ... DateAnalyzed Val 1 2018-03-18 0.470253 2 2018-03-19 0.470253 3 2018-03-20 0.470253 DateAnalyzed Val 5 2018-09-25 0.467729 6 2018-09-26 0.467729 7 2018-09-27 0.467729

      

  • 相关阅读:
    python
    爬虫
    python 自动登录
    day22 cookie session 中间件 Form
    day10进程、异步IO、
    python第五课
    day21
    day20 Django
    day 19
    day18
  • 原文地址:https://www.cnblogs.com/zcsh/p/14790823.html
Copyright © 2011-2022 走看看