zoukankan      html  css  js  c++  java
  • [python]Mongodb

    文档:

    http://api.mongodb.com/python/current/tutorial.html

    安装:

    官网直接下载安装, mac上brew安装的下载太慢, 打算手动安装

    使用:

    开启服务:

    1 mongod #默认配置开启服务
    2 mongod -- dpath <db path> # 指定数据库文件路径

    连接服务:

    1 mongo # 默认配置连接
    2 mongo [options] [db address] [file names (ending in .js)]

    图形可视化程序:

    https://www.robomongo.org/

    shell:

     1 > help
     2     db.help()                    help on db methods
     3     db.mycoll.help()             help on collection methods
     4     sh.help()                    sharding helpers
     5     rs.help()                    replica set helpers
     6     help admin                   administrative help
     7     help connect                 connecting to a db help
     8     help keys                    key shortcuts
     9     help misc                    misc things to know
    10     help mr                      mapreduce
    11 
    12     show dbs                     show database names
    13     show collections             show collections in current database
    14     show users                   show users in current database
    15     show profile                 show most recent system.profile entries with time >= 1ms
    16     show logs                    show the accessible logger names
    17     show log [name]              prints out the last segment of log in memory, 'global' is default
    18     use <db_name>                set current database
    19     db.foo.find()                list objects in collection foo
    20     db.foo.find( { a : 1 } )     list objects in foo where a == 1
    21     it                           result of the last line evaluated; use to further iterate
    22     DBQuery.shellBatchSize = x   set default number of items to display on shell
    23     exit                         quit the mongo shell

     more helps...

     1 > db.help()
     2 DB methods:
     3     db.adminCommand(nameOrDocument) - switches to 'admin' db, and runs command [just calls db.runCommand(...)]
     4     db.aggregate([pipeline], {options}) - performs a collectionless aggregation on this database; returns a cursor
     5     db.auth(username, password)
     6     db.cloneDatabase(fromhost)
     7     db.commandHelp(name) returns the help for the command
     8     db.copyDatabase(fromdb, todb, fromhost)
     9     db.createCollection(name, {size: ..., capped: ..., max: ...})
    10     db.createView(name, viewOn, [{$operator: {...}}, ...], {viewOptions})
    11     db.createUser(userDocument)
    12     db.currentOp() displays currently executing operations in the db
    13     db.dropDatabase()
    14     db.eval() - deprecated
    15     db.fsyncLock() flush data to disk and lock server for backups
    16     db.fsyncUnlock() unlocks server following a db.fsyncLock()
    17     db.getCollection(cname) same as db['cname'] or db.cname
    18     db.getCollectionInfos([filter]) - returns a list that contains the names and options of the db's collections
    19     db.getCollectionNames()
    20     db.getLastError() - just returns the err msg string
    21     db.getLastErrorObj() - return full status object
    22     db.getLogComponents()
    23     db.getMongo() get the server connection object
    24     db.getMongo().setSlaveOk() allow queries on a replication slave server
    25     db.getName()
    26     db.getPrevError()
    27     db.getProfilingLevel() - deprecated
    28     db.getProfilingStatus() - returns if profiling is on and slow threshold
    29     db.getReplicationInfo()
    30     db.getSiblingDB(name) get the db at the same server as this one
    31     db.getWriteConcern() - returns the write concern used for any operations on this db, inherited from server object if set
    32     db.hostInfo() get details about the server's host
    33     db.isMaster() check replica primary status
    34     db.killOp(opid) kills the current operation in the db
    35     db.listCommands() lists all the db commands
    36     db.loadServerScripts() loads all the scripts in db.system.js
    37     db.logout()
    38     db.printCollectionStats()
    39     db.printReplicationInfo()
    40     db.printShardingStatus()
    41     db.printSlaveReplicationInfo()
    42     db.dropUser(username)
    43     db.repairDatabase()
    44     db.resetError()
    45     db.runCommand(cmdObj) run a database command.  if cmdObj is a string, turns it into {cmdObj: 1}
    46     db.serverStatus()
    47     db.setLogLevel(level,<component>)
    48     db.setProfilingLevel(level,slowms) 0=off 1=slow 2=all
    49     db.setWriteConcern(<write concern doc>) - sets the write concern for writes to the db
    50     db.unsetWriteConcern(<write concern doc>) - unsets the write concern for writes to the db
    51     db.setVerboseShell(flag) display extra information in shell output
    52     db.shutdownServer()
    53     db.stats()
    54     db.version() current version of the server
    55 >
    DB methods
     1 > db.mycoll.help()
     2 DBCollection help
     3     db.mycoll.find().help() - show DBCursor help
     4     db.mycoll.bulkWrite( operations, <optional params> ) - bulk execute write operations, optional parameters are: w, wtimeout, j
     5     db.mycoll.count( query = {}, <optional params> ) - count the number of documents that matches the query, optional parameters are: limit, skip, hint, maxTimeMS
     6     db.mycoll.copyTo(newColl) - duplicates collection by copying all documents to newColl; no indexes are copied.
     7     db.mycoll.convertToCapped(maxBytes) - calls {convertToCapped:'mycoll', size:maxBytes}} command
     8     db.mycoll.createIndex(keypattern[,options])
     9     db.mycoll.createIndexes([keypatterns], <options>)
    10     db.mycoll.dataSize()
    11     db.mycoll.deleteOne( filter, <optional params> ) - delete first matching document, optional parameters are: w, wtimeout, j
    12     db.mycoll.deleteMany( filter, <optional params> ) - delete all matching documents, optional parameters are: w, wtimeout, j
    13     db.mycoll.distinct( key, query, <optional params> ) - e.g. db.mycoll.distinct( 'x' ), optional parameters are: maxTimeMS
    14     db.mycoll.drop() drop the collection
    15     db.mycoll.dropIndex(index) - e.g. db.mycoll.dropIndex( "indexName" ) or db.mycoll.dropIndex( { "indexKey" : 1 } )
    16     db.mycoll.dropIndexes()
    17     db.mycoll.ensureIndex(keypattern[,options]) - DEPRECATED, use createIndex() instead
    18     db.mycoll.explain().help() - show explain help
    19     db.mycoll.reIndex()
    20     db.mycoll.find([query],[fields]) - query is an optional query filter. fields is optional set of fields to return.
    21                                                   e.g. db.mycoll.find( {x:77} , {name:1, x:1} )
    22     db.mycoll.find(...).count()
    23     db.mycoll.find(...).limit(n)
    24     db.mycoll.find(...).skip(n)
    25     db.mycoll.find(...).sort(...)
    26     db.mycoll.findOne([query], [fields], [options], [readConcern])
    27     db.mycoll.findOneAndDelete( filter, <optional params> ) - delete first matching document, optional parameters are: projection, sort, maxTimeMS
    28     db.mycoll.findOneAndReplace( filter, replacement, <optional params> ) - replace first matching document, optional parameters are: projection, sort, maxTimeMS, upsert, returnNewDocument
    29     db.mycoll.findOneAndUpdate( filter, update, <optional params> ) - update first matching document, optional parameters are: projection, sort, maxTimeMS, upsert, returnNewDocument
    30     db.mycoll.getDB() get DB object associated with collection
    31     db.mycoll.getPlanCache() get query plan cache associated with collection
    32     db.mycoll.getIndexes()
    33     db.mycoll.group( { key : ..., initial: ..., reduce : ...[, cond: ...] } )
    34     db.mycoll.insert(obj)
    35     db.mycoll.insertOne( obj, <optional params> ) - insert a document, optional parameters are: w, wtimeout, j
    36     db.mycoll.insertMany( [objects], <optional params> ) - insert multiple documents, optional parameters are: w, wtimeout, j
    37     db.mycoll.mapReduce( mapFunction , reduceFunction , <optional params> )
    38     db.mycoll.aggregate( [pipeline], <optional params> ) - performs an aggregation on a collection; returns a cursor
    39     db.mycoll.remove(query)
    40     db.mycoll.replaceOne( filter, replacement, <optional params> ) - replace the first matching document, optional parameters are: upsert, w, wtimeout, j
    41     db.mycoll.renameCollection( newName , <dropTarget> ) renames the collection.
    42     db.mycoll.runCommand( name , <options> ) runs a db command with the given name where the first param is the collection name
    43     db.mycoll.save(obj)
    44     db.mycoll.stats({scale: N, indexDetails: true/false, indexDetailsKey: <index key>, indexDetailsName: <index name>})
    45     db.mycoll.storageSize() - includes free space allocated to this collection
    46     db.mycoll.totalIndexSize() - size in bytes of all the indexes
    47     db.mycoll.totalSize() - storage allocated for all data and indexes
    48     db.mycoll.update( query, object[, upsert_bool, multi_bool] ) - instead of two flags, you can pass an object with fields: upsert, multi
    49     db.mycoll.updateOne( filter, update, <optional params> ) - update the first matching document, optional parameters are: upsert, w, wtimeout, j
    50     db.mycoll.updateMany( filter, update, <optional params> ) - update all matching documents, optional parameters are: upsert, w, wtimeout, j
    51     db.mycoll.validate( <full> ) - SLOW
    52     db.mycoll.getShardVersion() - only for use with sharding
    53     db.mycoll.getShardDistribution() - prints statistics about data distribution in the cluster
    54     db.mycoll.getSplitKeysForChunks( <maxChunkSize> ) - calculates split points over all chunks and returns splitter function
    55     db.mycoll.getWriteConcern() - returns the write concern used for any operations on this collection, inherited from server/db if set
    56     db.mycoll.setWriteConcern( <write concern doc> ) - sets the write concern for writes to the collection
    57     db.mycoll.unsetWriteConcern( <write concern doc> ) - unsets the write concern for writes to the collection
    58     db.mycoll.latencyStats() - display operation latency histograms for this collection
    59 >
    Collection methods
     1 > sh.help()
     2     sh.addShard( host )                       server:port OR setname/server:port
     3     sh.addShardToZone(shard,zone)             adds the shard to the zone
     4     sh.updateZoneKeyRange(fullName,min,max,zone)      assigns the specified range of the given collection to a zone
     5     sh.disableBalancing(coll)                 disable balancing on one collection
     6     sh.enableBalancing(coll)                  re-enable balancing on one collection
     7     sh.enableSharding(dbname)                 enables sharding on the database dbname
     8     sh.getBalancerState()                     returns whether the balancer is enabled
     9     sh.isBalancerRunning()                    return true if the balancer has work in progress on any mongos
    10     sh.moveChunk(fullName,find,to)            move the chunk where 'find' is to 'to' (name of shard)
    11     sh.removeShardFromZone(shard,zone)      removes the shard from zone
    12     sh.removeRangeFromZone(fullName,min,max)   removes the range of the given collection from any zone
    13     sh.shardCollection(fullName,key,unique,options)   shards the collection
    14     sh.splitAt(fullName,middle)               splits the chunk that middle is in at middle
    15     sh.splitFind(fullName,find)               splits the chunk that find is in at the median
    16     sh.startBalancer()                        starts the balancer so chunks are balanced automatically
    17     sh.status()                               prints a general overview of the cluster
    18     sh.stopBalancer()                         stops the balancer so chunks are not balanced automatically
    19     sh.disableAutoSplit()                   disable autoSplit on one collection
    20     sh.enableAutoSplit()                    re-enable autoSplit on one collection
    21     sh.getShouldAutoSplit()                 returns whether autosplit is enabled
    22 >
    sharding helpers
     1 > rs.help()
     2     rs.status()                                { replSetGetStatus : 1 } checks repl set status
     3     rs.initiate()                              { replSetInitiate : null } initiates set with default settings
     4     rs.initiate(cfg)                           { replSetInitiate : cfg } initiates set with configuration cfg
     5     rs.conf()                                  get the current configuration object from local.system.replset
     6     rs.reconfig(cfg)                           updates the configuration of a running replica set with cfg (disconnects)
     7     rs.add(hostportstr)                        add a new member to the set with default attributes (disconnects)
     8     rs.add(membercfgobj)                       add a new member to the set with extra attributes (disconnects)
     9     rs.addArb(hostportstr)                     add a new member which is arbiterOnly:true (disconnects)
    10     rs.stepDown([stepdownSecs, catchUpSecs])   step down as primary (disconnects)
    11     rs.syncFrom(hostportstr)                   make a secondary sync from the given member
    12     rs.freeze(secs)                            make a node ineligible to become primary for the time specified
    13     rs.remove(hostportstr)                     remove a host from the replica set (disconnects)
    14     rs.slaveOk()                               allow queries on secondary nodes
    15 
    16     rs.printReplicationInfo()                  check oplog size and time range
    17     rs.printSlaveReplicationInfo()             check replica set members and replication lag
    18     db.isMaster()                              check who is primary
    19 
    20     reconfiguration helpers disconnect from the database so the shell will display
    21     an error, even if the command succeeds.
    22 >
    replica set helpers
     1 > help admin
     2     ls([path])                      list files
     3     pwd()                           returns current directory
     4     listFiles([path])               returns file list
     5     hostname()                      returns name of this host
     6     cat(fname)                      returns contents of text file as a string
     7     removeFile(f)                   delete a file or directory
     8     load(jsfilename)                load and execute a .js file
     9     run(program[, args...])         spawn a program and wait for its completion
    10     runProgram(program[, args...])  same as run(), above
    11     sleep(m)                        sleep m milliseconds
    12     getMemInfo()                    diagnostic
    13 >
    administrative help
     1 > help connect
     2 
     3 Normally one specifies the server on the mongo shell command line.  Run mongo --help to see those options.
     4 Additional connections may be opened:
     5 
     6     var x = new Mongo('host[:port]');
     7     var mydb = x.getDB('mydb');
     8   or
     9     var mydb = connect('host[:port]/mydb');
    10 
    11 Note: the REPL prompt only auto-reports getLastError() for the shell command line connection.
    12 
    13 >
    connect db help
     1 > help keys
     2 Tab completion and command history is available at the command prompt.
     3 
     4 Some emacs keystrokes are available too:
     5   Ctrl-A start of line
     6   Ctrl-E end of line
     7   Ctrl-K del to end of line
     8 
     9 Multi-line commands
    10 You can enter a multi line javascript expression.  If parens, braces, etc. are not closed, you will see a new line
    11 beginning with '...' characters.  Type the rest of your expression.  Press Ctrl-C to abort the data entry if you
    12 get stuck.
    13 
    14 >
    shotcut keys
     1 > help misc
     2     b = new BinData(subtype,base64str)  create a BSON BinData value
     3     b.subtype()                         the BinData subtype (0..255)
     4     b.length()                          length of the BinData data in bytes
     5     b.hex()                             the data as a hex encoded string
     6     b.base64()                          the data as a base 64 encoded string
     7     b.toString()
     8 
     9     b = HexData(subtype,hexstr)         create a BSON BinData value from a hex string
    10     b = UUID(hexstr)                    create a BSON BinData value of UUID subtype
    11     b = MD5(hexstr)                     create a BSON BinData value of MD5 subtype
    12     "hexstr"                            string, sequence of hex characters (no 0x prefix)
    13 
    14     o = new ObjectId()                  create a new ObjectId
    15     o.getTimestamp()                    return timestamp derived from first 32 bits of the OID
    16     o.isObjectId
    17     o.toString()
    18     o.equals(otherid)
    19 
    20     d = ISODate()                       like Date() but behaves more intuitively when used
    21     d = ISODate('YYYY-MM-DD hh:mm:ss')    without an explicit "new " prefix on construction
    22 >
    misc
     1 > help mr
     2 
     3 See also http://dochub.mongodb.org/core/mapreduce
     4 
     5 function mapf() {
     6   // 'this' holds current document to inspect
     7   emit(key, value);
     8 }
     9 
    10 function reducef(key,value_array) {
    11   return reduced_value;
    12 }
    13 
    14 db.mycollection.mapReduce(mapf, reducef[, options])
    15 
    16 options
    17 {[query : <query filter object>]
    18  [, sort : <sort the query.  useful for optimization>]
    19  [, limit : <number of objects to return from collection>]
    20  [, out : <output-collection name>]
    21  [, keeptemp: <true|false>]
    22  [, finalize : <finalizefunction>]
    23  [, scope : <object where fields go into javascript global scope >]
    24  [, verbose : true]}
    25 
    26 >
    mr

    python驱动

     pip install pymongo 

    scrapy:

    settings.py

    1 ITEM_PIPELINES = ['stack.pipelines.MongoDBPipeline', ]
    2 
    3 MONGODB_SERVER = "localhost"
    4 MONGODB_PORT = 27017
    5 MONGODB_DB = "stackoverflow"
    6 MONGODB_COLLECTION = "questions"

    piplines.py

     1 import pymongo
     2 
     3 from scrapy.conf import settings
     4 from scrapy.exceptions import DropItem
     5 from scrapy import log
     6 
     7 
     8 class MongoDBPipeline(object):
     9 
    10     def __init__(self):
    11         connection = pymongo.MongoClient(
    12             settings['MONGODB_SERVER'],
    13             settings['MONGODB_PORT']
    14         )
    15         db = connection[settings['MONGODB_DB']]
    16         self.collection = db[settings['MONGODB_COLLECTION']]
    17 
    18     def process_item(self, item, spider):
    19         valid = True
    20         for data in item:
    21             if not data:
    22                 valid = False
    23                 raise DropItem("Missing {0}!".format(data))
    24         if valid:
    25             self.collection.insert(dict(item))
    26             log.msg("Question added to MongoDB database!",
    27                     level=log.DEBUG, spider=spider)
    28         return item

    scrapy 官方文档 https://doc.scrapy.org/en/latest/topics/item-pipeline.html#write-items-to-mongodb:

    piplines.py

     1 import pymongo
     2 
     3 class MongoPipeline(object):
     4 
     5     collection_name = 'scrapy_items'
     6 
     7     def __init__(self, mongo_uri, mongo_db):
     8         self.mongo_uri = mongo_uri
     9         self.mongo_db = mongo_db
    10 
    11     @classmethod
    12     def from_crawler(cls, crawler):
    13         return cls(
    14             mongo_uri=crawler.settings.get('MONGO_URI'),
    15             mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
    16         )
    17 
    18     def open_spider(self, spider):
    19         self.client = pymongo.MongoClient(self.mongo_uri)
    20         self.db = self.client[self.mongo_db]
    21 
    22     def close_spider(self, spider):
    23         self.client.close()
    24 
    25     def process_item(self, item, spider):
    26         self.db[self.collection_name].insert_one(dict(item))
    27         return item
  • 相关阅读:
    ceph 手工部署
    zstack 搭建部署
    ceph crush
    mini2440动态加载hello.ko模块
    j-flash配置用于烧录mini 2440 nor flash
    (转载)PPP协议规范
    at91sam9263: 定时器
    cyg_io_read返回值是0,因为读到的字节长度在参数中
    read函数
    Linux编译错误:‘cout’在此作用域中尚未声明
  • 原文地址:https://www.cnblogs.com/sigai/p/8417550.html
Copyright © 2011-2022 走看看