zoukankan      html  css  js  c++  java
  • pandas对时间列分组求diff遇到的问题

    例子:

    df = pd.DataFrame()
    df['A'] = [1, 1, 2]
    df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)]
    df['C'] = df.groupby('A').B.diff()
    df['C'] = df.C.dt.days
    

     

    报错:

    Traceback (most recent call last):
      File "D:python_virtualenvcommonlibsite-packagespandas-0.20.3-py3.6-win-amd64.eggpandascoreseries.py", line 2820, in _make_dt_accessor
        return maybe_to_datetimelike(self)
      File "D:python_virtualenvcommonlibsite-packagespandas-0.20.3-py3.6-win-amd64.eggpandascoreindexesaccessors.py", line 84, in maybe_to_datetimelike
        "datetimelike index".format(type(data)))
    TypeError: cannot convert an object of type <class 'pandas.core.series.Series'> to a datetimelike index
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "D:/学习/pandas_test/pandas_learn_20190102.py", line 49, in <module>
        test2()
      File "D:/学习/pandas_test/pandas_learn_20190102.py", line 32, in test2
        df['C'] = df.C.dt.days
      File "D:python_virtualenvcommonlibsite-packagespandas-0.20.3-py3.6-win-amd64.eggpandascoregeneric.py", line 3077, in __getattr__
        return object.__getattribute__(self, name)
      File "D:python_virtualenvcommonlibsite-packagespandas-0.20.3-py3.6-win-amd64.eggpandascorease.py", line 243, in __get__
        return self.construct_accessor(instance)
      File "D:python_virtualenvcommonlibsite-packagespandas-0.20.3-py3.6-win-amd64.eggpandascoreseries.py", line 2822, in _make_dt_accessor
        raise AttributeError("Can only use .dt accessor with datetimelike "
    AttributeError: Can only use .dt accessor with datetimelike values

    原因:
    分组求diff后的结果是:

    A B C
    0 1 2018-01-02 NaT
    1 1 2018-01-03 1 days 00:00:00
    2 2 2018-01-03 NaN

    类型是:

    A int64
    B object
    C object
    dtype: object

    预想的类型是:

    A int64
    B object
    C timedelta64[ns]
    dtype: object

    解决:
    原本尝试使用astype强制将object列,转成timedelta列

    df['C'] = df.C.astype(pd.Timedelta)

    这句代码不会报错,但是C列的类型不会改变,没有作用。

    最后有两种处理方式:
    提前定义B列为时间列:

    df = pd.DataFrame()
    df['A'] = [1, 1, 2]
    df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)]
    df.B = pd.to_datetime(df.B)
    df['C'] = df.groupby('A').B.diff()
    df['C'] = df.C.dt.days

    增加类型转换:

    df = pd.DataFrame()
    df['A'] = [1, 1, 2]
    df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)]
    df['C'] = df.groupby('A').B.diff()
    df['C'] = pd.to_timedelta(df.C, unit='d').dt.days
  • 相关阅读:
    《Linux shell编程中 diff与vimdif的使用》RHEL6
    《mysql数据库备份小脚本》
    《linux下sudo服务的使用》RHEL6
    《通过脚本查看哪些ip被占用》shell笔记
    linux系统环境变量.bash_profile/bashrc文件
    清空系统日志shell scripts——自学笔记
    《linux源代码包的编译安装》RHEL6
    《linux 网卡别名的添加和绑定》RHEL6
    《iptables详解 》RHEL6
    转:swagger 入门
  • 原文地址:https://www.cnblogs.com/hufulinblog/p/10207678.html
Copyright © 2011-2022 走看看