pandas有两种主要的数据结构:Series and DateFrame
Series数据由索引,数据,数据类型构成。
索引是一个ndarray,
数据是另一个ndarray,
可以切片但是不能按下标取Series数据
要取相应索引的值:series.index.values[xx]
取相应下标数据的值:series.values[xx]
DateFrame的增删改查:
DateFrame的重排:df.reindex() 可以重排行,也能重拍列
DataFrame.
reindex
(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)
index,columns参数都可以接收列表的形式,
index参数也能接收series数据
>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D') >>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]}, ... index=date_index) >>> df2 prices 2010-01-01 100 2010-01-02 101 2010-01-03 NaN 2010-01-04 100 2010-01-05 89 2010-01-06 88
DateFrame的增加,列的增加可以用df['name']=[list],df['name']=Series的方式增加
行的增加可以df=df.append(DateFrame)
DateFrame条件筛选:
import tushare as ts df=ts.get_k_data('hs300',start='2018-05-1',end='2018-05-30') print(df) print(df[(df.close>3700)&(df.volume>80000000)]) 输出 date open close high low volume code 83 2018-05-10 3882.84 3893.06 3894.49 3867.31 70667809.0 hs300 84 2018-05-11 3902.15 3872.84 3903.34 3871.95 73356604.0 hs300 85 2018-05-14 3886.87 3909.29 3919.30 3886.87 81975999.0 hs300 86 2018-05-15 3920.14 3924.10 3924.34 3892.92 76049555.0 hs300 87 2018-05-16 3909.82 3892.84 3923.34 3889.19 71913646.0 hs300 88 2018-05-17 3895.49 3864.05 3899.51 3858.85 59030123.0 hs300 89 2018-05-18 3860.16 3903.06 3903.06 3841.90 72613956.0 hs300 90 2018-05-21 3925.54 3921.24 3937.46 3913.60 92641737.0 hs300 91 2018-05-22 3918.82 3906.21 3918.82 3881.17 75368778.0 hs300 92 2018-05-23 3898.27 3854.58 3898.27 3854.58 83722319.0 hs300 93 2018-05-24 3853.29 3827.22 3859.10 3823.92 68345022.0 hs300 94 2018-05-25 3823.74 3816.50 3841.12 3804.14 70836386.0 hs300 95 2018-05-28 3816.26 3833.26 3846.55 3799.32 71521132.0 hs300 96 2018-05-29 3824.19 3804.01 3841.78 3800.67 81552662.0 hs300 97 2018-05-30 3755.18 3723.37 3767.89 3722.07 87913285.0 hs300 date open close high low volume code 85 2018-05-14 3886.87 3909.29 3919.30 3886.87 81975999.0 hs300 90 2018-05-21 3925.54 3921.24 3937.46 3913.60 92641737.0 hs300 92 2018-05-23 3898.27 3854.58 3898.27 3854.58 83722319.0 hs300 96 2018-05-29 3824.19 3804.01 3841.78 3800.67 81552662.0 hs300 97 2018-05-30 3755.18 3723.37 3767.89 3722.07 87913285.0 hs300
DateFrame的绘图:
pandas.DataFrame.plot
DataFrame.
plot
(x=None, y=None, kind='line', ax=None, subplots=False, sharex=None, sharey=False, layout=None, figsize=None, use_index=True, title=None, grid=None, legend=True, style=None, logx=False, logy=False, loglog=False, xticks=None, yticks=None, xlim=None, ylim=None, rot=None, fontsize=None, colormap=None, table=False, yerr=None, xerr=None, secondary_y=False, sort_columns=False, **kwds)