zoukankan      html  css  js  c++  java
  • pandas的札记

    导入导出数据

    在导入,导出DataFrame数据时,会用到各种格式,分为 to_csv ;to_excel;to_hdf;to_sql;to_json;to_msgpack ;to_html;to_gbq ;to_stata;to_clipboard;to_pickle

    可参照IO Tools 分类。

    输出指定colums是,会用到arg colums,例如

    to_csv(filename,columns=["col1","col2"],......)
    # 此处注意的是要使用双引号,单引号不起效果,不知道为什么,另外
    # index,header设置为False会不写入行号(索引好)和列标
    #也可如下方式使用list函数
    to_csv(filename,columns = list('col1','col2'),......)

    如果想要保存为ascii文本则可以使用to_csv,可以对是否保存索引(行号)等参数进设置。

    调换colums顺序

    若原始数据是这样的:

    In [6]: df
    Out[6]:
              0         1         2         3         4      mean
    0  0.445598  0.173835  0.343415  0.682252  0.582616  0.445543
    1  0.881592  0.696942  0.702232  0.696724  0.373551  0.670208
    2  0.662527  0.955193  0.131016  0.609548  0.804694  0.632596
    3  0.260919  0.783467  0.593433  0.033426  0.512019  0.436653
    4  0.131842  0.799367  0.182828  0.683330  0.019485  0.363371
    5  0.498784  0.873495  0.383811  0.699289  0.480447  0.587165
    6  0.388771  0.395757  0.745237  0.628406  0.784473  0.588529
    7  0.147986  0.459451  0.310961  0.706435  0.100914  0.345149
    8  0.394947  0.863494  0.585030  0.565944  0.356561  0.553195
    9  0.689260  0.865243  0.136481  0.386582  0.730399  0.561593
    
    In [7]: cols = df.columns.tolist()
    
    In [8]: cols
    Out[8]: [0L, 1L, 2L, 3L, 4L, 'mean']
    View Code

    通过调换columns更改顺序

    In [12]: cols = cols[-1:] + cols[:-1] 
    In [13]: cols
    Out[13]: ['mean', 0L, 1L, 2L, 3L, 4L]

    进而可以达到如下效果

    In [16]: df = df[cols]  #    OR    df = df.ix[:, cols]
    
    In [17]: df
    Out[17]:
           mean         0         1         2         3         4
    0  0.445543  0.445598  0.173835  0.343415  0.682252  0.582616
    1  0.670208  0.881592  0.696942  0.702232  0.696724  0.373551
    2  0.632596  0.662527  0.955193  0.131016  0.609548  0.804694
    3  0.436653  0.260919  0.783467  0.593433  0.033426  0.512019
    4  0.363371  0.131842  0.799367  0.182828  0.683330  0.019485
    5  0.587165  0.498784  0.873495  0.383811  0.699289  0.480447
    6  0.588529  0.388771  0.395757  0.745237  0.628406  0.784473
    7  0.345149  0.147986  0.459451  0.310961  0.706435  0.100914
    8  0.553195  0.394947  0.863494  0.585030  0.565944  0.356561
    9  0.561593  0.689260  0.865243  0.136481  0.386582  0.730399
    View Code

     (参考来源

    pandas DataFrame 中指定位置数据的修改:

    df['one']['second'] = value
    # 由于DataFrame在索引数据是得到的是副本copy所以,此时原数据df并没有修改,并会抛出警告Warning: SettingWithCopy 
    
    df.loc['one','second'] = value
    #如上会修改原数据df
    #或是:
    dfmi.loc[:,('one','second')] = value

    具体参考SettingWithCopy

    pandas DataFrame & Series 遍历数据(loop iterate on data)

    DataFrame

     1 dates = pd.date_range("20150101",periods=3)
     2 df = pd.DataFrame(np.random.randn(3,4),index = dates,columns=['A','B','C','D'])
     3 df
     4 dates = pd.date_range("20150101",periods=3)
     5 df = pd.DataFrame(np.random.randn(3,4),index = dates,columns=['A','B','C','D'])
     6 df
     7 Out[36]:
     8 A    B    C    D
     9 2015-01-01    -0.888495    -0.983042    0.162524    -0.768370
    10 2015-01-02    0.954982    0.777860    -0.635805    -0.271617
    11 2015-01-03    1.778827    1.052819    0.090116    -1.822029
    1. DataFrame.iteritems()    :Iterator over (column name, Series) pairs. 
      1 for colName,colSeries in df.iteritems():
      2     print colName
      3     print colSeries
       1 A
       2 2015-01-01   -0.888495
       3 2015-01-02    0.954982
       4 2015-01-03    1.778827
       5 Freq: D, Name: A, dtype: float64
       6 B
       7 2015-01-01   -0.983042
       8 2015-01-02    0.777860
       9 2015-01-03    1.052819
      10 Freq: D, Name: B, dtype: float64
      11 C
      12 2015-01-01    0.162524
      13 2015-01-02   -0.635805
      14 2015-01-03    0.090116
      15 Freq: D, Name: C, dtype: float64
      16 D
      17 2015-01-01   -0.768370
      18 2015-01-02   -0.271617
      19 2015-01-03   -1.822029
      20 Freq: D, Name: D, dtype: float64
      View Code
       
    2. DataFrame.iterrows()    :Iterate over the rows of a DataFrame as (index, Series) pairs. 数据一致是对列来说的,所以此方法迭代时数据类型会改变,如果想使用原始数据类型,最好使用itertuples,且速度快于Itetuples.
      1 for index,rowSeries in df.iterrows():
      2     print index
      3     print rowSeries
       1 2015-01-01 00:00:00
       2 A   -0.888495
       3 B   -0.983042
       4 C    0.162524
       5 D   -0.768370
       6 Name: 2015-01-01 00:00:00, dtype: float64
       7 2015-01-02 00:00:00
       8 A    0.954982
       9 B    0.777860
      10 C   -0.635805
      11 D   -0.271617
      12 Name: 2015-01-02 00:00:00, dtype: float64
      13 2015-01-03 00:00:00
      14 A    1.778827
      15 B    1.052819
      16 C    0.090116
      17 D   -1.822029
      18 Name: 2015-01-03 00:00:00, dtype: float64
      View Code
       
    3. DataFrame.itertuples(index=True)    :Iterate over the rows of DataFrame as tuples, with index value as first element of the tuple.
      1 for rowTuple in df.itertuples():
      2     print rowTuple[0]
      3     print rowTuple[1:]
      1 2015-01-01 00:00:00
      2 (-0.88849501182393553, -0.98304167749573845, 0.1625244406175089, -0.76836987403165646)
      3 2015-01-02 00:00:00
      4 (0.95498214900986345, 0.77786021238601544, -0.635805031818656, -0.27161684716624435)
      5 2015-01-03 00:00:00
      6 (1.7788269763069902, 1.0528194112440166, 0.09011643978723563, -1.82202928954011)
      View Code

    Series

    1. Series.iteritems()                           :Lazily iterate over (index, value) tuples
       1 In [51]:
       2 
       3 s = pd.Series(['a','b','c','d','e'])
       4 s
       5 s = pd.Series(['a','b','c','d','e'])
       6 s
       7 Out[51]:
       8 0    a
       9 1    b
      10 2    c
      11 3    d
      12 4    e
      13 dtype: object
      1 for index,value in s.iteritems():
      2     print index,value
      3 0 a
      4 1 b
      5 2 c
      6 3 d
      7 4 e
      View Code
  • 相关阅读:
    oracle spatial 类型
    感悟
    给年轻工程师的十大忠告
    美剧
    幸福人生讲座(一):不学礼,无以立
    人成长中须知道的20个故事
    孔子
    毕业五年决定你的一生
    sysindexes表中求SELECT COUNT(*)
    我们应该懂得
  • 原文地址:https://www.cnblogs.com/vin-yuan/p/4780305.html
Copyright © 2011-2022 走看看