zoukankan      html  css  js  c++  java
  • pandas中的遍历方式速度对比

    对一个20667行的xlsx文件进行遍历测试

    
    import pandas as pd
    
    # 定义一个计算执行时间的函数作装饰器,传入参数为装饰的函数或方法
    def print_execute_time(func):
        from time import time
    
        # 定义嵌套函数,用来打印出装饰的函数的执行时间
        def wrapper(*args, **kwargs):
            # 定义开始时间和结束时间,将func夹在中间执行,取得其返回值
            start = time()
            func_return = func(*args, **kwargs)
            end = time()
            # 打印方法名称和其执行时间
            print(f'{func.__name__}() execute time: {end - start}s')
            # 返回func的返回值
            return func_return
    
        # 返回嵌套的函数
        return wrapper
    
    file_path = r"D:gitxxxxdevpd-xxx1.2合并.xlsx"
    data = pd.read_excel(file_path,sheet_name="xxxx",engine='openpyxl')
    # 空值处理
    df = data.where(data.notnull(),None)
    
    
    @print_execute_time
    def iterrows():
        for index, row in df.iterrows():
            # print(index," = ",row['机号'])
            pass
    
    
    @print_execute_time
    def itertuples():
        for row in df.itertuples():
            # print(row['机号'])
            pass
    
    
    @print_execute_time
    def iteritems():
        for index, row in df.iteritems():
            # print(index," = ",row['机号'])
            pass
    
    @print_execute_time
    def index():
        for i in df.index:
            # print(i," = ",df['机号'].at[i])
            pass
    
    if __name__ == '__main__':
        print('begining ...')
        print(iterrows(),itertuples(),iteritems(),index())
        print('Done !')
    
    
    

    测试结果

    begining ...
    iterrows() execute time: 2.003657817840576s
    itertuples() execute time: 0.04618692398071289s
    iteritems() execute time: 0.0009987354278564453s
    index() execute time: 0.0029909610748291016s    
    Done !
    
    iterrows() execute time: 2.2464449405670166s
    itertuples() execute time: 0.08178043365478516s
    iteritems() execute time: 0.000997781753540039s
    index() execute time: 0.0059833526611328125s
    

    因此从效率上考虑,优先采用iteritemsindex来进行遍历数据

  • 相关阅读:
    C++11并发内存模型学习
    C++0x对局部静态初始化作出了线程安全的要求,singleton的写法可以回归到最原始的方式
    两次fopen不同的文件返回相同的FILE* 地址
    linux kernel kill jvm
    打印Exception信息
    java map value 排序
    java was started but returned exit code 1
    hive 建表语句
    hadoop mapreduce lzo
    分词 正文提取 java
  • 原文地址:https://www.cnblogs.com/dapenson/p/14369952.html
Copyright © 2011-2022 走看看