pandas中的apply、applymap、join、merge、concat用法

zoukankan html css js c++ java

pandas中的apply、applymap、join、merge、concat用法
1、apply、applymap、map

对数据进行处理的时候，使用循环往往会大大降低代码的执行效率，但是通过pandas中封装好的函数，则可以极大提升执行效率；

DataFrame.apply(self, func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds) # 轴级应用函数（默认axis=0）

DataFrame.applymap(self, func) # 元素级应用函数

map() #python 内置函数，常与匿名函数搭配使用
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), columns=['col1', 'col2', 'col3']) print df.applymap(lambda x: x*100) col1 col2 col3 0 -46.794441 -39.065235 -43.029788 1 -106.007018 27.150164 39.014999 2 -82.441000 -183.847367 28.454017 3 31.048271 -154.712457 -121.747535 4 49.811200 -100.607442 -137.614507 print df.apply(np.mean, axis=0) col1 0.645906 col2 -0.499199 col3 -0.021930 dtype: float64 print df.apply(np.mean, axis=0) 0 -0.143159 1 0.772798 2 -0.030872 3 -0.076344 4 -0.314461 dtype: float64 print df['col1'].map(lambda x: x*100) 0 27.613782 1 52.613560 2 139.238366 3 56.926986 4 46.560222 Name: col1, dtype: float64
2、append、join、merge、concat

　　如今数据为方便存储以及节省存储空间，多为关系型数据，例如商场的数据库中，分为顾客的个人属性信息、购物行为信息、商场的商品属性信息、存货信息等，为了进行数据分析，往往需要将这几个表按照制定的键进行连接，得到我们最终想要的数据宽表。

　　常用的有如下集中方法，都是pandas模块已经封装好的，直接修改里面的参数，调用即可。个人感觉，merge()在连接表的时候更好用一些，因为可选参数较多，可以满足实际中复杂的需求。append主要用于添加数据，并不是连接表。

pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True) #沿特定轴连接pandas对象，沿其他轴使用可选的设置逻辑

DataFrame.join(self, other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

DataFrame.merge(self, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

　　on：连接键 | left_on：左表的连接键 | right_on：右表的连接键 |  left_index=True将使用左表的索引作为连接键 | right_index=True 类似left_index

　　how：连接方式

　　suffixes：表的列名后缀

DataFrame.append(self, other, ignore_index=False, verify_integrity=False, sort=None) # 将其他行附加到调用者的末尾，返回一个新对象
import pandas as pd df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB')) df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB')) print df.append(df2) A B 0 1 2 1 3 4 0 5 6 1 7 8 print df.append(df2, ignore_index=True) A B 0 1 2 1 3 4 2 5 6 3 7 8
查看全文

相关阅读:
女孩提出分手的N种理由
 Attribute应用，简化ANF自定义控件初始化过程
 关于Web的动态页面与静态页面分开的想法.
.Net面试题
 算法题,不用递归,构造树型
 花两个小时,做了个分页控件
 事件应用,为系统提供扩展功能
 绘制半透明的图片
 Tile Studio简介（转载）
Thinking in Java　摘录笔记

原文地址：https://www.cnblogs.com/beyondChan/p/11374572.html