今天详细做下关于DataFrame的使用,以便以后自己可以翻阅查看
DataFrame的基本特征:
1、是一个表格型数据结构
2、含有一组有序的列
3、大致可看成共享同一个index的Series集合
import pandas as pd >>> data={'name':['Wangdachui','Linling','Niuyun'],'pay':[4000,5000,6000]} >>> frame=pd.DataFrame(data) >>> frame name pay 0 Wangdachui 4000 1 Linling 5000 2 Niuyun 6000
import pandas as pd >>> import numpy as np >>> data=np.array([('Wangdachui',4000),('Linling',5000),('Niuyun',6000)]) >>> frame=pd.DataFrame(data,index=range(1,4),columns=['name','pay']) >>> frame name pay 1 Wangdachui 4000 2 Linling 5000 3 Niuyun 6000 >>> frame.index RangeIndex(start=1, stop=4, step=1) >>> frame.columns Index(['name', 'pay'], dtype='object') >>> frame.values array([['Wangdachui', '4000'], ['Linling', '5000'], ['Niuyun', '6000']], dtype=object)
frame.index=[2,4,6] >>> frame name pay 2 Wangdachui 4000 4 Linling 5000 6 Niuyun 6000
DataFrame的基本操作
· 取DataFrame对象的行和列可获得Series:
frame['name'] 2 Wangdachui 4 Linling 6 Niuyun Name: name, dtype: object >>> frame.pay 2 4000 4 5000 6 6000 Name: pay, dtype: object >>> frame.iloc[:2,1] 2 4000 4 5000 Name: pay, dtype: object
DataFrame对象的修改和删除:
frame['name']='admin' >>> frame name pay 2 admin 4000 4 admin 5000 6 admin 6000 >>> del frame['pay'] >>> frame name 2 admin 4 admin 6 admin
DataFrame的统计功能
import pandas as pd >>> import numpy as np >>> data=np.array([('Wangdachui',4000),('Linling',5000),('Niuyun',6000)]) >>> frame=pd.DataFrame(data,index=range(1,4),columns=['name','pay']) >>> frame name pay 1 Wangdachui 4000 2 Linling 5000 3 Niuyun 6000 >>> frame.pay.min() '4000'
frame[frame.pay>='5000'] name pay 2 Linling 5000 3 Niuyun 6000