zoukankan      html  css  js  c++  java
  • 如何快速地从mongo中提取数据到numpy以及pandas中去

    mongo数据通常过于庞大,很难一下子放进内存里进行分析,如果直接在python里使用字典来存贮每一个文档,使用list来存储数据的话,将很快是内存沾满。型号拥有numpy和pandas

    import numpy
    import pymongo
    
    c = pymongo.MongoClient()
    collection = c.mydb.collection
    num = collection.count()
    arrays = [ numpy.zeros(num) for i in range(5) ]
    
    for i, record in enumerate(collection.find()):
        for x in range(5):
            arrays[x][i] = record["x%i" % x+1]
    
    for array in arrays: # prove that we did something...
        print numpy.mean(array)

    上面的代码在处理大量数据时,发现消耗时间的关键在于pymongo cursor的迭代,为此有一个c写好的库monary 来直接实现这种转换来提高效率

    from monary import Monary
    import numpy
    
    with Monary("127.0.0.1") as monary:
        arrays = monary.query(
            "mydb",                         # database name
            "collection",                   # collection name
            {},                             # query spec
            ["x1", "x2", "x3", "x4", "x5"], # field names (in Mongo record)
            ["float64"] * 5                 # Monary field types (see below)
        )
    
    for array in arrays:                    # prove that we did something...
        print numpy.mean(array)

    那转换成pandas呢?

    参考这里

  • 相关阅读:
    [CC-TRIPS]Children Trips
    [HDU5968]异或密码
    [CC-PERMUTE]Just Some Permutations 3
    [HackerRank]Choosing White Balls
    Gym102586L Yosupo's Algorithm
    Gym102586B Evacuation
    Kattis anothercoinweighingpuzzle Another Coin Weighing Puzzle
    Gym102586I Amidakuji
    CF1055F Tree and XOR
    CF241B Friends
  • 原文地址:https://www.cnblogs.com/wybert/p/5076173.html
Copyright © 2011-2022 走看看