zoukankan      html  css  js  c++  java
  • python 字典查询提速的小技巧

    考虑一个问题:一个python的字典,有1000万个key-value对,新插入1000对键值对,怎么速度才最快

    自己测试了一部分代码,慢速的要300秒,加速的只要0.3秒,原因是慢速的代码每次查询非常费时,

    if k in C14.keys()可能是这句话的问题,
    改进后使用
    defaultdict(int)方法提速!不要用dict()初始化方法了...

    原始代码:极其慢(尤其是原始字典很大的时候)

    #test slower code
    import pandas as pd
    import pickle
    from collections import Counter
    import os
    from tqdm import tqdm
    import time
    from collections import defaultdict
    
    C14 = dict() #注意这里没有用defaultdict
    for i in tqdm(range(10000000)):
        C14[i] = i
    
    print("start processing test data:")
    s_time = time.time()
    
    
    data = pd.read_csv('../../test.gz')
    print("read test.gz over")
    
    print("start to process C14:")
    s_tt = time.time()
    
    C14_list = data['C14'].values  #data是dataframe格式,data['C14'].values相当于一个list,比如[42,523,23,24,3,4,1,5,3]
    for k,v in tqdm(Counter(C14_list).items()):
    
      if k in C14.keys():  #判断所消耗的时间很长
             C14[k] += v
      else:
             C14[k] = v
            
    e_tt = time.time()
    print("C14 over,cost time:{} seconds".format(e_tt-s_tt))
                
        
    
    e_time = time.time()
    print("test data processing over, cost {} minutes".format((e_time-s_time)/60))

    改进后的代码:极快

    #test code
    import pandas as pd
    import pickle
    from collections import Counter
    import os
    from tqdm import tqdm
    import time
    from collections import defaultdict
    
    C14 = defaultdict(int)   #使用python的defaultdict方法,意思是,如果key[value]的value不存在时,默认value值是int的0
    for i in tqdm(range(10000000)):
        C14[i] = i
    
    print("start processing test data:")
    s_time = time.time()
    
    data = pd.read_csv('../../test.gz')
    print("read test.gz over")
    
    print("start to process C14:")
    s_tt = time.time()
    
    C14_list = data['C14'].values
    for k,v in tqdm(Counter(C14_list).items()):
        C14[k] += v
    #下面四行话可以全部注释掉了
         #if k in C14.keys():  
             #C14[k] += v
         #else:
             #C14[k] = v
            
    e_tt = time.time()
    print("C14 over,cost time:{} seconds".format(e_tt-s_tt))
                
        
    e_time = time.time()
    print("test data processing over, cost {} minutes".format((e_time-s_time)/60))
  • 相关阅读:
    Navicat在MySQL中添加外键详细过程
    java绘图原理------在窗口界面(或面板上)画出一张或多张图片问题解决方法
    记录springboot jar包冲突异常处理
    终止线程
    SpringDataJPA
    mysql安装后无法启动问题
    地理空间几种数据格式
    图幅与经纬度之间的换算
    postgresql12集成postgis与timescale
    C# 微信支付 V2
  • 原文地址:https://www.cnblogs.com/qiezi-online/p/14156967.html
Copyright © 2011-2022 走看看