zoukankan      html  css  js  c++  java
  • python panda::dataframe常用操作

    1、条件查询:

    result = df.query("((a==1 and b=="x") or c/d < 3))"
    print result

    2、遍历

    a)根据索引遍历

    for  idx in df.index:
      dd = df.loc[idx]
      print(dd)

    b)按行遍历

    for  i in range(0, len(df)):
      dd = df.iloc[i]
      print(dd)

    3、对某列求均值

    # 对“volume”列求均值
    result = df["volume"].mean()
    print(result)

    4、按照指定列排序

    result_df = df.sort_values(by="sales" , ascending=False) 
    print(result_df)

    注意,以上排序,非inplace

    5、提取特定行/列

    如有数据:

            code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
    42  HK.02018  2019-04-26 16:08:05       53.70       52.70     ...               NaN          NaN           NaN         NaN
    15  HK.00151  2019-04-26 16:08:33        6.17        6.21     ...               NaN          NaN           NaN         NaN
    14  HK.00101  2019-04-26 16:08:05       18.22       18.26     ...               NaN          NaN           NaN         NaN

    a)按照索引提取

    提取索引为42的行和所有列:

    result = df.loc[42, :]
    print(result)

    result:

            code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
    42  HK.02018  2019-04-26 16:08:05       53.70       52.70     ...               NaN          NaN           NaN         NaN

    提取索引为15,42的数据,  只需要code和update_time两列:

    result = df.loc[[15,42], [0,2]]
    print(result)

    result:

            code          update_time  
    42  HK.02018  2019-04-26 16:08:05 
    15  HK.00151  2019-04-26 16:08:33 

    b)按行提取

    提取第2行的数据, 所有列:

    result = df.iloc[1, :]
    print(result)

    result:

           code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
    15  HK.00151  2019-04-26 16:08:33        6.17        6.21     ...               NaN          NaN           NaN         NaN

    提取前2行的数据, 所有列:

    result = df.iloc[0:2, :]
    print(result)

    result:

            code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
    42  HK.02018  2019-04-26 16:08:05       53.70       52.70     ...               NaN          NaN           NaN         NaN
    15  HK.00151  2019-04-26 16:08:33        6.17        6.21     ...               NaN          NaN           NaN         NaN

    提取1、3行的数据, 只需要code和update_time两列:

    result = df.iloc[[0,2], 0:2]
    print(result)

    result:

            code          update_time 
    42  HK.02018  2019-04-26 16:08:05
    14  HK.00101  2019-04-26 16:08:05

    6、复制列

    df['col']=df['col1']+df['col2']

    将col1和col2相除的结果加1,放入新的newcol列:

    df['newcol']=df['col1']/df['col2']+1

    7、重命名列

    new_df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
    print(new_df)
    # inplace模式
    df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)
    print(df)
  • 相关阅读:
    机器学习到深度学习资料
    安装CentOS 6停在selinux-policy-targeted卡住的问题解决
    U盘安装Ubuntu 16.04出现:Failed to load ldlinux.c32
    Ubuntu 16.04下使用UNetbootin制作的ISO镜像为U盘启动出现:Missing Operating System (mbr.bin)
    为什么Linux的Fdisk分区时First Sector为2048?
    Windows下将ISO镜像制作成U盘启动的工具(U盘启动工具/UltraISO/Rufus/Universal-USB)
    CentOS 6.9安装类型选择(Basic Server/Web Server)
    Java中String与byte[]的转换
    IntelliJ IDEA插件-翻译插件
    Mycat查询时出现:Error Code: 1064. can't find any valid datanode
  • 原文地址:https://www.cnblogs.com/moodlxs/p/10777248.html
Copyright © 2011-2022 走看看