zoukankan html css js c++ java

python panda::dataframe常用操作

1、条件查询：

result = df.query("((a==1 and b=="x") or c/d < 3))"
print result

2、遍历

a）根据索引遍历

for  idx in df.index:
　　dd = df.loc[idx]
　　print(dd)

b）按行遍历

for  i in range(0, len(df)):
　　dd = df.iloc[i]
　　print(dd)

3、对某列求均值

# 对“volume”列求均值
result = df["volume"].mean()
print(result)

4、按照指定列排序

result_df = df.sort_values(by="sales" , ascending=False) 
print(result_df)

注意，以上排序，非inplace

5、提取特定行/列

如有数据：

        code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
42  HK.02018  2019-04-26 16:08:05       53.70       52.70     ...               NaN          NaN           NaN         NaN
15  HK.00151  2019-04-26 16:08:33        6.17        6.21     ...               NaN          NaN           NaN         NaN
14  HK.00101  2019-04-26 16:08:05       18.22       18.26     ...               NaN          NaN           NaN         NaN

a）按照索引提取

提取索引为42的行和所有列：

result = df.loc[42, :]
print(result)

result:

        code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
42  HK.02018  2019-04-26 16:08:05       53.70       52.70     ...               NaN          NaN           NaN         NaN

提取索引为15，42的数据, 只需要code和update_time两列:

result = df.loc[[15,42], [0,2]]
print(result)

result:

        code          update_time  
42  HK.02018  2019-04-26 16:08:05 
15  HK.00151  2019-04-26 16:08:33

b）按行提取

提取第2行的数据, 所有列：

result = df.iloc[1, :]
print(result)

result:

       code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
15  HK.00151  2019-04-26 16:08:33        6.17        6.21     ...               NaN          NaN           NaN         NaN

提取前2行的数据, 所有列：

result = df.iloc[0:2, :]
print(result)

result:

        code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
42  HK.02018  2019-04-26 16:08:05       53.70       52.70     ...               NaN          NaN           NaN         NaN
15  HK.00151  2019-04-26 16:08:33        6.17        6.21     ...               NaN          NaN           NaN         NaN

提取1、3行的数据, 只需要code和update_time两列:

result = df.iloc[[0,2], 0:2]
print(result)

result:

        code          update_time 
42  HK.02018  2019-04-26 16:08:05
14  HK.00101  2019-04-26 16:08:05

6、复制列

df['col']=df['col1']+df['col2']

将col1和col2相除的结果加1，放入新的newcol列：

df['newcol']=df['col1']/df['col2']+1

7、重命名列

new_df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
print(new_df)
# inplace模式
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)
print(df)

查看全文

相关阅读:
每天OnLineJudge 之 “蛇形矩阵 ”
Hello World 发生了什么?
软件开发人员真的了解SQL索引吗(索引使用原则)
软件开发人员真的了解SQL索引吗(聚集索引)
项目经验总结(一)如何约定接口的定义
 min的个人网站终于创建起来了
 WCF单例服务，如何实现并发
 如何规范.net中的js开发(2)原理篇(更新版)
网站架构之缓存应用(3)实现篇
 网站架构之缓存应用(1)概念篇

原文地址：https://www.cnblogs.com/moodlxs/p/10777248.html