这里面我们介绍一下python中操作mangodb的第三方库pymongo的使用,以及简单的使用requests库作爬虫。人情冷暖正如花开花谢,不如将这种现象,想成一种必然的季节。
pymongo的安装及前期准备
一、mangodb的安装以及启动
测试机器:win10, mangodb版本v3.4.0,python版本3.6.3。
mangodb的安装目录:D:DatabaseDataBaseMongo。数据的存放目录:E:datadatabasemangodata。首先我们启动mangodb服务器的:可以看到在本地27017端口成功启动server。
D:DatabaseDataBaseMongoServer3.4in>mongod --dbpath E:datadatabasemangodata 2017-11-21T20:48:38.458+0800 I CONTROL [initandlisten] MongoDB starting : pid=20484 port=27017 dbpath=E:datadatabasemangodata 64-bit host=Linux 2017-11-21T20:48:38.461+0800 I CONTROL [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2 2017-11-21T20:48:38.462+0800 I CONTROL [initandlisten] db version v3.4.0 2017-11-21T20:48:38.463+0800 I CONTROL [initandlisten] git version: f4240c60f005be757399042dc12f6addbc3170c1 2017-11-21T20:48:38.464+0800 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1t-fips 3 May 2016 2017-11-21T20:48:38.465+0800 I CONTROL [initandlisten] allocator: tcmalloc 2017-11-21T20:48:38.466+0800 I CONTROL [initandlisten] modules: none 2017-11-21T20:48:38.466+0800 I CONTROL [initandlisten] build environment: 2017-11-21T20:48:38.467+0800 I CONTROL [initandlisten] distmod: 2008plus-ssl 2017-11-21T20:48:38.468+0800 I CONTROL [initandlisten] distarch: x86_64 2017-11-21T20:48:38.469+0800 I CONTROL [initandlisten] target_arch: x86_64 2017-11-21T20:48:38.469+0800 I CONTROL [initandlisten] options: { storage: { dbPath: "E:datadatabasemangodata" } } 2017-11-21T20:48:38.491+0800 I - [initandlisten] Detected data files in E:datadatabasemangodata created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'. 2017-11-21T20:48:38.493+0800 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=5573M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0), 2017-11-21T20:48:39.931+0800 I CONTROL [initandlisten] 2017-11-21T20:48:39.933+0800 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database. 2017-11-21T20:48:39.936+0800 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted. 2017-11-21T20:48:39.940+0800 I CONTROL [initandlisten] 2017-11-21T20:48:41.253+0800 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory 'E:/data/database/mango/data/diagnostic.data' 2017-11-21T20:48:41.259+0800 I NETWORK [thread1] waiting for connections on port 27017
mangodb客户端的启动:D:DatabaseDataBaseMongoServer3.4inmongo.exe。双击即可运行
MongoDB shell version v3.4.0 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.4.0 Server has startup warnings: 2017-11-21T20:48:39.931+0800 I CONTROL [initandlisten] 2017-11-21T20:48:39.933+0800 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database. 2017-11-21T20:48:39.936+0800 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted. 2017-11-21T20:48:39.940+0800 I CONTROL [initandlisten] >
二、python中pymongo的安装
pip install pymongo
这里简单的介绍一下pymongo的使用,这里面的代码是选自github的入门例子。
>>> import pymongo >>> client = pymongo.MongoClient("localhost", 27017) >>> db = client.test >>> db.name u'test' >>> db.my_collection Collection(Database(MongoClient('localhost', 27017), u'test'), u'my_collection') >>> db.my_collection.insert_one({"x": 10}).inserted_id ObjectId('4aba15ebe23f6b53b0000000') >>> db.my_collection.insert_one({"x": 8}).inserted_id ObjectId('4aba160ee23f6b543e000000') >>> db.my_collection.insert_one({"x": 11}).inserted_id ObjectId('4aba160ee23f6b543e000002') >>> db.my_collection.find_one() {u'x': 10, u'_id': ObjectId('4aba15ebe23f6b53b0000000')} >>> for item in db.my_collection.find(): ... print(item["x"]) ... 10 8 11 >>> db.my_collection.create_index("x") u'x_1' >>> for item in db.my_collection.find().sort("x", pymongo.ASCENDING): ... print(item["x"]) ... 8 10 11 >>> [item["x"] for item in db.my_collection.find().limit(2).skip(1)] [8, 11]
pymongo的使用例子
一、python爬虫以及pymongo存储数据
import requests import pymongo import json def requestData(): url = 'http://****.com/*.do' data = { 'projectId': 90, 'myTaskFlag': 1, 'userId': 40 } json_data = requests.post(url, data=json.dumps(data)).json() return json_data def output_data(json_data): client = pymongo.MongoClient(host='localhost', port=27017) db = client.test collection = db.tasks tasks_data = json_data.get('taskList') collection.insert(tasks_data) client.close() if __name__ == '__main__': json_data = requestData() output_data(json_data)
我们把得到的数据存放在tasks集合中,这里使用的是mangodb默认的test数据库。运行完程序,我们可以通过mangodb的客户端查看数据,运行:db.tasks.find().pretty()可以查询tasks集合的所有数据。
{ "_id" : ObjectId("5a1427a2edc9f04be40bc02d"), "taskId" : 1, "summary" : "PC版“个人信息”页面优化", "status" : 8, "categoryId" : 3, "creatorId" : 7, "projectId" : 1, "dateSubmit" : NumberLong("1481105108000"), "level" : 1, "handlerId" : 2, "ViewState" : 2, "priority" : 2 } { "_id" : ObjectId("5a1427a2edc9f04be40bc02e"), "taskId" : 2, "summary" : "PC版“添加新任务”界面字体太大", "status" : 8, "categoryId" : 3, "creatorId" : 7, "projectId" : 1, "dateSubmit" : NumberLong("1481105195000"), "level" : 1, "handlerId" : 2, "ViewState" : 2, "priority" : 1 }