zoukankan      html  css  js  c++  java
  • 如何迭代pandas dataframe的行

    from:https://blog.csdn.net/tanzuozhev/article/details/76713387

    How to iterate over rows in a DataFrame in Pandas-DataFrame按行迭代

    https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas

    http://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandas

    在对DataFrame进行操作时,我们不可避免的需要逐行查看或操作数据,那么有什么高效、快捷的方法呢?

    index序号索引

    import pandas as pd
    inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
    df = pd.DataFrame(inp)
    for x in xrange(len(df.index)):
        print df['c1'].iloc[x]

    这似乎是最常规的办法,而且可以在迭代的过程中对DataFrame进行操作。

    enumerate

    for i, row in enumerate(df.values):
        index= df.index[i]
        print row

    df.values 是 numpy.ndarray 类型
    这里 i 是index的序号, row是numpy.ndarray类型。

    iterrows

    https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html

    import pandas as pd
    inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
    df = pd.DataFrame(inp)
    
    for index, row in df.iterrows():
        print row['c1'], row['c2']
    
    #10 100
    #11 110
    #12 120

    df.iterrows() 的每次迭代都是一个tuple类型,包含了index和每行的数据。

    1. 采用iterrows的方法,得到的 row 是一个Series,DataFrame的dtypes不会被保留。
    2. 返回的Series只是一个原始DataFrame的复制,不可以对原始DataFrame进行修改;

    itertuples

    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.itertuples.html

    import pandas as pd
    inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
    df = pd.DataFrame(inp)
    
    for row in df.itertuples():
        # print row[0], row[1], row[2] 等同于
        print row.Index, row.c1, row.c2

    itertuples 返回的是一个 pandas.core.frame.Pandas 类型。

    普遍认为itertuples 比 iterrows的速度要快。

    zip / itertools.izip

    zip 和 itertools.izip的用法是相似的, 但是zip返回一个list,而izip返回一个迭代器。 如果数据量很大,zip的性能不及izip

    from itertools import izip
    import pandas as pd
    inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
    df = pd.DataFrame(inp)
    
    for row in izip(df.index, df['c1'], df['c2']):
        print row

    时间测评

    import time
    from numpy.random import randn
    
    df = pd.DataFrame({'a': randn(100000), 'b': randn(100000)})
    
    time_stat = []
    
    # range(index)
    test_list = []
    t = time.time()
    for r in xrange(len(df)):
        test_list.append((df.index[r], df.iloc[r,0], df.iloc[r,1]))
    time_stat.append(time.time()-t)
    
    # enumerate
    test_list = []
    t = time.time()
    for i, r in enumerate(df.values):
        test_list.append((df.index[i], r[0], r[1]))
    time_stat.append(time.time()-t)
    
    # iterrows
    test_list = []
    t = time.time()
    for i,r in df.iterrows():
        test_list.append((df.index[i], r['a'], r['b']))
    time_stat.append(time.time()-t)
    
    #itertuples
    test_list = []
    t = time.time()
    for ir in df.itertuples():
        test_list.append((ir[0], ir[1], ir[2]))    
    time_stat.append(time.time()-t)
    
    # zip
    test_list = []
    t = time.time()
    for r in zip(df.index, df['a'], df['b']):
        test_list.append((r[0], r[1], r[2]))
    time_stat.append(time.time()-t)
    
    # izip
    test_list = []
    t = time.time()
    from itertools import izip
    for r in izip(df.index, df['a'], df['b']):
        test_list.append((r[0], r[1], r[2]))
    time_stat.append(time.time()-t)
    
    time_df = pd.DataFrame({'items':['range(index)', 'enumerate',  'iterrows', 'itertuples' , 'zip', 'izip'], 'time':time_stat})
    
    time_df.sort_values('time')
    
    
    items   time
    5   izip    0.034869
    4   zip 0.040440
    3   itertuples  0.072604
    1   enumerate   0.174094
    2   iterrows    4.026293
    0   range(index)    21.921407

    可以发现在时间花销上, izip > zip > itertuples > enumerate > iterrows > range(index)

  • 相关阅读:
    Celery 分布式任务队列入门
    异步通信----WebSocket
    爬虫框架之scrapy
    《JavaScript 高级程序设计》第一章:简介
    NodeJS学习:环境变量
    cmd 与 bash 基础命令入门
    H5开发中的故障
    认识 var、let、const
    netsh & winsock & 对前端的影响
    scrollify
  • 原文地址:https://www.cnblogs.com/bonelee/p/9732761.html
Copyright © 2011-2022 走看看