简单操作
Python-层次聚类-Hierarchical clustering
>>> data = pd.Series(np.random.randn(10),index=[['a','a','a','b','b','c','c','d','d','d'],[1,2,3,1,2,1,2,3,1,2]]) >>> data a 1 -0.168871 2 0.828841 3 0.786215 b 1 0.506081 2 -2.304898 c 1 0.864875 2 0.183091 d 3 -0.678791 1 -1.241735 2 0.778855 dtype: float64
Hierarchical与DataFrame之间的转换
>>> data.unstack() 1 2 3 a -0.168871 0.828841 0.786215 b 0.506081 -2.304898 NaN c 0.864875 0.183091 NaN d -1.241735 0.778855 -0.678791 >>> type(data.unstack()) <class 'pandas.core.frame.DataFrame'>
Merge,join,Concatenate
>>> df2 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000]},index=['hangzhou','najing'])
>>> df1 = pd.DataFrame({'apts':[55000,60000],'cars':[20000,30000]},index=['shanghai','beijing'])
>>> df3 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000]},index=['guangzhou','chongqing'])
>>> [df1,df2,df3]
[ apts cars
shanghai 55000 20000
beijing 60000 30000, apts cars
hangzhou 55000 15000
najing 60000 12000, apts cars
guangzhou 55000 15000
chongqing 60000 12000]
>>> pd.concat([df1,df2,df3])
apts cars
shanghai 55000 20000
beijing 60000 30000
hangzhou 55000 15000
najing 60000 12000
guangzhou 55000 15000
chongqing 60000 12000
frames = [df1,df2,df3]
>>> result2 = pd.concat(frames,keys=['x','y','z'])
>>> result2
apts cars
x shanghai 55000 20000
beijing 60000 30000
y hangzhou 55000 15000
najing 60000 12000
z guangzhou 55000 15000
chongqing 60000 12000
进行拼接concat
>>> df4 = pd.DataFrame({"salaries":[10000,30000,30000,20000,15000]},index=['suzhou','beijing','shanghai','guanghzou','tianjin'])
>>> result3 = pd.concat([result,df4],axis=1)
>>> result3
apts cars salaries
beijing 60000.0 30000.0 30000.0
chongqing 60000.0 12000.0 NaN
guanghzou NaN NaN 20000.0
guangzhou 55000.0 15000.0 NaN
hangzhou 55000.0 15000.0 NaN
najing 60000.0 12000.0 NaN
shanghai 55000.0 20000.0 30000.0
suzhou NaN NaN 10000.0
tianjin NaN NaN 15000.0
合并两个DataFrame,并且只是交集
>>> result3 = pd.concat([result,df4],axis=1,join='inner') >>> result3 apts cars salaries shanghai 55000 20000 30000 beijing 60000 30000 30000
Series和DataFrame一起Concatenate
>>> s1 = pd.Series([60,50],index=['shanghai','beijing'],name='meal') >>> s1 shanghai 60 beijing 50 Name: meal, dtype: int64 >>> type(s1) <class 'pandas.core.series.Series'> >>> df1 apts cars shanghai 55000 20000 beijing 60000 30000 >>> type(df1) <class 'pandas.core.frame.DataFrame'> >>> pd.concat([df1,s1],axis=1) apts cars meal shanghai 55000 20000 60 beijing 60000 30000 50 >>>
Series可以使用append进行行添加也可以列添加,但是concat不可以
>>> s2 = pd.Series([18000,12000],index=['apts','cars'],name='xiamen') >>> s2 apts 18000 cars 12000 Name: xiamen, dtype: int64 >>> df1.append(s2) apts cars shanghai 55000 20000 beijing 60000 30000 xiamen 18000 12000 >>> pd.concat([df1,s2],axis=0) 0 apts cars shanghai NaN 55000.0 20000.0 beijing NaN 60000.0 30000.0 apts 18000.0 NaN NaN cars 12000.0 NaN NaN >>> pd.concat([df1,s2],axis=1) apts cars xiamen apts NaN NaN 18000.0 beijing 60000.0 30000.0 NaN cars NaN NaN 12000.0 shanghai 55000.0 20000.0 NaN >>>
merge合并
>>> df1 = pd.DataFrame({"salaries":[10000,30000,30000,20000,15000],'cities':['suzhou','beijing','shanghai','guanghzou','tianjin']})
>>> df4 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000],'cities':['shanghai','beijing']})
>>> result = pd.merge(df1,df4,on='cities') #on表示合并的列
>>> result
cities salaries apts cars
0 beijing 30000 60000 12000
1 shanghai 30000 55000 15000
>>> result = pd.merge(df1,df4,on='cities',how='right') >>> result cities salaries apts cars 0 beijing 30000 60000 12000 1 shanghai 30000 55000 15000 >>> result = pd.merge(df1,df4,on='cities',how='left') >>> result cities salaries apts cars 0 suzhou 10000 NaN NaN 1 beijing 30000 60000.0 12000.0 2 shanghai 30000 55000.0 15000.0 3 guanghzou 20000 NaN NaN 4 tianjin 15000 NaN NaN