zoukankan      html  css  js  c++  java
  • Pandas加载含有嵌套(nested)文档的mongodb数据

    读取MongoDB数据到Pandas中

    我们可以通过pymongo包连接mongodb进行数据处理,然后将数据存储到pandas的dataframe里面。

    例子中的student表的数据结构如下:

    	{'_id': ObjectId('5c7138f4e0411eb39749fdff'), 'name': 'student1', 'id_no': 1, 'scores': {'math': 63, 'art': 72, 'music': 93}}
    	{'_id': ObjectId('5c7138f4e0411eb39749fe00'), 'name': 'student2', 'id_no': 2, 'scores': {'math': 58, 'art': 70, 'music': 67}}
    	{'_id': ObjectId('5c7138f4e0411eb39749fe01'), 'name': 'student3', 'id_no': 3, 'scores': {'math': 66, 'art': 80, 'music': 81}}
    

    不含嵌套数据

    如果我们读取mongodb的数据不含嵌套数据,我们可以直接将其载入到Pandas的dataframe中:

    import pymongo as pm
    import pandas as pd
    import numpy as np
    import datetime as dt
    client = pm.MongoClient('mongodb://user1:user1@127.0.0.1:27017')
    db = client['my_db']
    proection = {'name':1,'id_no':1, }
    mongo_data = list(db['students'].find({}, proection))
    df = pd.DataFrame(mongo_date)
    

    结果如下

    index _id id_no name
    0 5c7138f4e0411eb39749fdff 1 student1
    1 5c7138f4e0411eb39749fe00 2 student2
    2 5c7138f4e0411eb39749fe01 3 student3

    含有嵌套数据

    如果我们读取mongodb的数据含有嵌套数据,我们需要先将所有嵌套的数据通过json_util工具先解析出来,具体代码如下:

    import pymongo as pm
    import pandas as pd
    import numpy as np
    import datetime as dt
    from bson import json_util
    from pandas.io.json import json_normalize
    import json
    client = pm.MongoClient('mongodb://user1:user1@127.0.0.1:27017')
    db = client['my_db']
    mongo_data = list(db['students'].find({}))
    sanitized = json.loads(json_util.dumps(mongo_data))
    normalized = json_normalize(sanitized)
    df = pd.DataFrame(normalized)
    print(df)
    
    

    结果如下:

    _id.$oid id_no name scores.art scores.math scores.music
    0 5c7138f4e0411eb39749fdff 1 student1 72 63
    1 5c7138f4e0411eb39749fe00 2 student2 70 58
    2 5c7138f4e0411eb39749fe01 3 student3 80 66
  • 相关阅读:
    C#深入浅出 修饰符(二)
    HDU 5785 Interesting
    HDU 5783 Divide the Sequence
    HDU 5781 ATM Mechine
    UVA 714 Copying Books
    uva 1471 Defense Lines
    UVA 11134 Fabled Rooks
    UVA 11572 Unique Snowflakes
    UVA 11093 Just Finish it up
    UVA 10954 Add All
  • 原文地址:https://www.cnblogs.com/lestatzhang/p/10611335.html
Copyright © 2011-2022 走看看