zoukankan      html  css  js  c++  java
  • pandas 初识(五)

    1. 如何实现把一个属性(列)拆分成多列,产生pivot,形成向量信息,计算相关性?

    例:

         class_    timestamp    count
    0    10    2019-01-20 13:23:00    1
    1    10    2019-01-20 13:24:00    2
    2    10    2019-01-20 13:25:00    2
    3    10    2019-01-20 13:26:00    1
    4    10    2019-01-20 13:27:00    2

    转为:

    class_ 1 2 3 4 10
    timestamp
    2019-01-20 13:23:01 1.0 NaN NaN NaN NaN
    2019-01-20 13:24:02 NaN NaN 2.0 NaN NaN
    2019-01-20 13:25:03 NaN 2.0 NaN NaN NaN
    2019-01-20 13:26:02 NaN NaN NaN 1.0 NaN
    2019-01-20 13:27:05 NaN NaN NaN NaN 2.0

    解决:

    import pandas as pd
    from pandas import Timestamp
    
    info = {'class_': {0: 1, 1: 2, 2: 3, 3: 4, 4: 10},
     'timestamp': {0: Timestamp('2019-01-20 13:23:00'),
      1: Timestamp('2019-01-20 13:24:00'),
      2: Timestamp('2019-01-20 13:25:00'),
      3: Timestamp('2019-01-20 13:26:00'),
      4: Timestamp('2019-01-20 13:27:00')},
     'count': {0: 1, 1: 2, 2: 2, 3: 1, 4: 2}}
    df = pd.DataFrame(info)  
    # df.pivot(index='timestamp', columns="class_", values="count").fillna(0)  
    df.pivot(index='timestamp', columns="class_", values="count")

    2. 如何实现把一个属性的多列(属性唯一)合并成一列

    例:

    class_ 1 2 3 4 10
    timestamp
    2019-01-20 13:23:01 1.0 NaN NaN NaN NaN
    2019-01-20 13:24:02 NaN NaN 2.0 NaN NaN
    2019-01-20 13:25:03 NaN 2.0 NaN NaN NaN
    2019-01-20 13:26:02 NaN NaN NaN 1.0 NaN
    2019-01-20 13:27:05 NaN NaN NaN NaN 2.0

    转为:

         class_    timestamp    count
    0    10    2019-01-20 13:23:00    1
    1    10    2019-01-20 13:24:00    2
    2    10    2019-01-20 13:25:00    2
    3    10    2019-01-20 13:26:00    1
    4    10    2019-01-20 13:27:00    2

    解决:

    import pandas as pd
    from pandas import Timestampinfo = {'class_': {0: 1, 1: 2, 2: 3, 3: 4, 4: 10},
     'timestamp': {0: Timestamp('2019-01-20 13:23:00'),
      1: Timestamp('2019-01-20 13:24:00'),
      2: Timestamp('2019-01-20 13:25:00'),
      3: Timestamp('2019-01-20 13:26:00'),
      4: Timestamp('2019-01-20 13:27:00')},
     'count': {0: 1, 1: 2, 2: 2, 3: 1, 4: 2}}
    df = pd.DataFrame(info)  
    # df1 = _df.pivot(index='timestamp', columns="class_", values="count").dropna()
    df1 = _df.pivot(index='timestamp', columns="class_", values="count")
    df1 = _df.stack().reset_index()
    df1.columns = ["class_", "count"]
  • 相关阅读:
    what's the python之if判断、while循环以及for循环
    what's the python之基本运算符及字符串、列表、元祖、集合、字典的内置方法
    what's the python之变量、基本数据类型
    what's the python之python介绍
    计算机基础系列之网络基础——网络协议
    计算机基础系列之何为操作系统
    计算机基础系列之计算机硬件
    EXT3_DX_ADD_ENTRY: DIRECTORY INDEX FULL!
    无shell情况下的mysql远程mof提权利用方法详解
    /bin/rm: Argument list too long解決方法
  • 原文地址:https://www.cnblogs.com/spaceapp/p/11674966.html
Copyright © 2011-2022 走看看