zoukankan      html  css  js  c++  java
  • Node.js 操作Mongodb

                       Node.js 操作Mongodb
    1.简介
    官网英文文档  https://docs.mongodb.com/manual/  这里几乎什么都有了
    MongoDB is open-source document database that provides high performance , high availability , and automatic scaling.
    MongoDB是一个开源的文档数据库,提供高性能、高可用性、自动缩放
    2.安装
    详细的安装相关信息,比如支持哪些系统,32位和64位有哪些区别,这些请上官网自行查看

    Sudo apt-get install –y mongodb-org
    安装mongodb-org 会依赖于
    mongodb-org-server  包含mongod进程,配置文件还有初始化脚本
    mongodb-org-mongos 包含mongos进程
    mongodb-org-shell    包含mongo命令
    mongodb-org-tools    包含mongoinport ,mongodump,mongorestore,mongofiles等命令
    各个命令有什么用,请看下章。

    启动mongodb
    sudo mongod
    启动的时候有一些参数,可以通过mongod –help查看
    默认端口为27017,你也可以通过mongod  --port=xxx 来改变端口。
    一般常用的是 mongod  -f /etc/mongodb.config 来启动   mongodb.config如何配置请
    停止mongodb
    官网推荐的是 mongo 进到命令行里
    use admin   //转到admin数据库
    db.shutdownServer()   //停止mongodb
    也可以直接kill mongodb的进程
    ps –ef|grep mongodb   找到mongodb的进程pid
    Kill -9 pid

    3.命令
    启动mongodb命令
    cd mongodbHome
    ./bin/mongo   具体参数可以用 ./bin/mongo --help查看 如下
    help          显示帮助
    show dbs      显示数据库列表
    use <db>      转到某个数据库
    show collections  显示当前数据库所包含的集合(类似mysql中的表)
    show users     显示当前数据库的用户列表
    ……           …….
    ……           …….
    还有很多功能,这里就不列
    4.增删改查
    1.新增
    db.collection.insert({x:”a”,y:”b”,z:”c”})
    意思是向collection集合中插入一条数据,格式是x:”a”,y:”b”,z:”c”  这个格式类似json,可以自定义,collection是集合名称,比如向user集合中插入数据{name=”老王”,info=”老王是一种神秘的生物,他往往就住在你隔壁”}可以这么写
    db.user.insert({name:”老王”,info:”老王是一种神秘的生物,他往往就住在你隔壁”})
    mongodb3.2版还增加了 db.collection.insertOne()和db.collection.insertMany() 两个方法
    db.collection.insertMany([
       {x:”a”,y:”b”,z:”c”},
       {x:”1”,y:”2”,z:”3”},
       {……………………….},
              …
              …
              …
    ])
    这个方法效率很高。使用命令行的时候,可以先设置一个变量A=[{…},{…},{…},{…}]
    在使用db.collection.insertMany(A)
    2.删除
    db.collection.remove({x:”a”})
    意思是删除collection集合中x=”a”的所有数据
       3.修改
           db.collection.update({
               {x:”a”},
               {$set{y:”1”}},
               {multi:true}     //true:匹配多行 false:匹配一行 默认为false
    })
    相当于  update collection set y=”1” where x=”a”
    4.查询     mongodb语句        对应sql语句
       db.collection.find()        select * from collection
       db.collection.find({x:”a”})   select * from collection where x=”a”;
       db.collection.find({x:”a”,z:{$lt:”3”}})   select * from collection where x=”a” and z<3
       db.collection.find({x:”a”,$or{z:{$lt:”3”}}})  select * from collection where x=”a” or z<3
       db.collection.find({x:”a”},{x:1,y:0,z:1})  select x,z from collection where x=”a”
     
       游标
       var myCursor=db.collection.find();
       while(myCursor.hasNext()){
          print(myCursor.next())
    }
    更多高级应用请查阅 db.collection.bulkWrite()
    Mongodb命令与sql的对比请移步至 https://docs.mongodb.com/manual/reference/sql-comparison/  查看
    5.数据处理
    1.Aggregation
     Aggregations operations process data records and return computed results
    Aggregation处理数据记录并返回处理结果,类比于关系数据库中的函数,存贮过程之类。限制是返回的数据要<16M   
    假设collecion中包含x,y,z
    db.collection.aggregate([
    {$match:{x:”a”}},
    {$group:{_id:$y,total:{$sum:z}}} 
    ])
    说明:$group一定要有_id,这个_id可以是collection里边的某个列名,等价于
    select y as _id,sum(z) as total from aggregate where x=”a” group by y
    $match,$group,$limit,$sort,等方法请参考https://docs.mongodb.com/manual/meta/aggregation-quick-reference/
    Aggregation与sql的对应关系请参考
    https://docs.mongodb.com/manual/reference/sql-aggregation-comparison/

    2.Map-Reduce
     Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results
    大概意思是Map-reduce能对大数据量进行处理,并返回处理的结果
    db.collection.mapReduce(
                 function(){             
    emite(this.x,this.z);        //map
    },
                 function(key,values){
                    return Array.sum(values);   //reduce
    },
                 {
                   query:{y:”b”},             //query
                   out:”test”                //output to collection name test
                 }
    )
    6.数据模型
    单文档操作是原子性的,支持事务
    多文档操作是并发的,不支持事务
    设置文档验证模型
    db.createCollection(“contacts”,
      {  validator:{[
          {phone:{$type:”string”}},
          {email:{$regex:/@qq.com$/}}
         ]}
    })
    db.runCommand(
      {
        collMod:”contacts”,
        validator:{[
             {phone:{$type:”string”}},
             {email:{$regex:/@qq.com$/}}
           ]}
          validationLevel:”moderate”
      }
    )
    You cannot specify a validator for collections in the admin, local, and config databases.
    注意:您无法对admin,local,config数据库使用文档验证模型
    7.管理
    1.多线程
    mmapv1 provides collection-level locking  
    意思是同一时间对collection里边的document操作只能有唯一一个读或写
    wiredTiger supports concurrent access by readers and writers to the documents in a collection. Clients can read documents while write operations are in progress, and multiple threads can modify different documents in a collection at the same time
    意思是wiredTiger支持同时对某一文档进行读写,同时对某一集合下的不同文档进行修改,说白了就是wiredTiger是document-level locking 且读写对document可同时存在.
    2.数据持久化
    MongoDB uses write ahead logging to an on-disk journal. Journaling guarantees that MongoDB can quickly recover write operations that were written to the journal but not written to data files in cases where mongod terminated due to a crash or other serious failure.
    Mongodb 使用wal日志(写之前先记录到日志中)来保证快速恢复。
    3.硬件要求
    wiredTiger越多的cpu核心越好,最少2核心.
    Mongodb3.2默认使用60%的内存,且最小1GB,推荐是单机内存10G
    启动时可以通过mongod –wiredTigerCacheSizeGB=xxx 来设置使用多少内存
    wiredTigerCacheSizeGB: Defines the maximum size of the internal cache that WiredTiger will use for all data.

    With the WiredTiger storage engine, use of XFS is strongly recommended to avoid performance issues that may occur when using EXT4 with WiredTiger
    使用wiredTiger引擎强烈建议使用xfs文件系统。xfs文件系统比ext4文件系统对mongodb更友好

    集群的时候要保证时间同步(NTP)
    4.性能优化
    For read-heavy applications, increase the size of your replica set and distribute read operations to secondary members.

    For write-heavy applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among mongod instances.
    针对读性能优化是增加replication(备份)来分担读压力
    针对写性能优化是增加sharding(分片)来减轻写压力

    //对某个数据库使用慢操作日志
    db.setProfilingLevel(0,100),设置这个以后就可以查看相应的慢操作了
    0:关闭 默认
    1:打开 仅记录慢操作 
    2:打开 记录所有操作
    慢操作定义为100毫秒
    查看慢操作  db.system.profile.find( { millis : { $gt : 100 } } )
    查看profile的定义 db.getProfilingStatus()
    //对所有数据库使用慢操作日志
    mongod --profile 1 --slowms 15
    5.配置文件
    设置查询时间 db.collection.find().maxTimeMS(30)
    设置命令时间 db.runCommand(
                    {   distinct:”collection”,
                        key:”a”,
                        maxTimeMS:40
                     }
    }
            获取当前操作 db.currentOp()
            停止当前操作 db.killOp(<opid>)
    6.备份
    最简单的备份是使用MongoDB Cloud Manager 或者 Ops Manager

    Back Up with Filesystem Snapshots   文件快照备份
    You can create a backup of a MongoDB deployment by making a copy of MongoDB’s underlying data files.   直接复制数据文件
    文件快照要求 To get a correct snapshot of a running mongod process, you must have journaling enabled  必须打开journaling
    步骤 1 db.fsyncLock() //上同步锁
         2 复制数据文件
         3 db.fsyncUnLock() //解锁
      
          Back Up with mongodump  使用 mongodump进行备份 适合单机,集群备份请使用专业工具
          simple and efficient tools for backing up and restoring small MongoDB deployments, but are not ideal for capturing backups of larger systems.  快速有效的备份,但不适于大量数据的备份
          mongodump only captures the documents in the database 注意,mongodump只备份文档,对于索引等其他的是不备份的。
          步骤 mongodump --collection myCollection --db test  --out /backup/dir
          具体参数请自行查看 mongodump –help
          恢复备份 mongorestore   <path to backup>
    7.运行状态
    mongostat captures and returns the counts of database operations by type
    mongostat 显示操作(增删改查)的次数

          mongotop tracks and reports the current read and write activity of a MongoDB instance, and reports these statistics on a per collection basis.
          mongotop 对当前的mongodb实例进行跟踪和报告

          Mongodb web 控制台 http://localhost:port  port=mongod的port+1000 默认是28017

          db.serverStatus() from the shell, returns a general overview of the status of the database
          db.serverStatus()可以查看整个mongodb的状态

          db.stats() from the shell, returns a document that addresses storage use and data volumes.
          db.stats() 返回当前数据库的状态

          db.collection.stats() provides statistics that resemble dbStats on the collection level
          db.collection.stats() 返回当前集合的状态
       
          rs.status() returns an overview of your replica sets status
          rs.status() 返回集群的状态

    8.索引
    格式 db.collection.createIndex( {<key and index type specification>, <options>},{xxx1:xxx2} )
    key:要索引的列名 options: 1|-1  顺序还是逆序索引
    xxx1:索引参数名   xxx2:索引参数值
    对大量数据索引会影响性能,所以一般会实行后台索引 db.collection.createIndex({key:options},{backgroup:true})
    对索引进行命名
    db.collection.createIndex({key:options},{name:yourName })

    复合索引  db.collection.createIndex( { <field1>: <type>, <field2>: <type2>, ... } )
    There can be no more than 31 fields in a compound index
    复合索引不允许超过31个复合列,item和stock不能同时为数组
    举例说明: db.products.createIndex( { "item": 1, "stock": 1 } )
    The index will contain references to documents sorted first by the values of the item field and, within each value of the item field, sorted by values of the stock field
    首先先按item排序,每个item内部再按stock排序

    Text索引  尽量不要使用,很耗时耗内存 
    db.reviews.createIndex( { comments: "text" } )
    db.reviews.createIndex( { title:”text”,comments: "text" } ) //复合索引
    db.reviews.createIndex( { title:”text”,comments: "text" },{weight:{title:2,comments:5}} ) 权重
    db.collection.dropIndex()   //删除索引
    db.collection.getIndexes()  //获取索引

    Hash索引   限制:无法使用复合索引
    db.collection.createIndex( { _id: "hashed" } )
    db.collection.createIndex( { _id: "hashed" ,name:”hashed”} )  注意,这是错误的

    TTL索引    Time To Live
    限制:无法对_id使用,不支持复合索引,eventlog不能使capped collection
    db.eventlog.createIndex( { name: 1 }, { expireAfterSeconds: 3600 } )

    唯一索引
    db.members.createIndex( { "user_id": 1 }, { unique: true } )

    稀疏索引
    Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value
    稀疏索引只索引包含该索引列的文档,注意document的格式是不固定的。
    db.addresses.createIndex( { "xmpp_id": 1 }, { sparse: true } )

    集群索引
    1.Stop One Secondary                mongod --port 47017
    2.Build the Index                    db.records.createIndex( { username: 1 } )
    3.Restart the Program mongod         mongod --port 27017 --replSet rs0
    第一步,停止mongod,在另一个端口启动单例mongod。
    第二步,在其上创建索引。
    第三步,停止单例mongod,使用原先的集群配置再次启动。

    注意,官网如是说
    When building an index on a collection, the database that holds the collection is unavailable for read or write operations until the index build completes
    建立索引的时候,mongodb不允许读写直到索引完成。除非加上参数backgroup:true
    For replica sets, secondaries will begin building indexes after the primary finishes building the index. In sharded clusters, the mongos will send createIndex() to the primary members of the replica set for each shard, which then replicate to the secondaries after the primary finishes building the index.
    Replica建立索引的时候,主节点先建索引,完成后,子节点在建立索引。
    在Sharded(分片)集群中,主节点的各个分片先建立索引,完成后,轮到子节点各个分片Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without falling too far behind to catch up
    集群建立索引的时候,要设置足够大的oplog,否则会报错。
    查看索引性能 report使用zipcode索引的性能
    db.people.find(
     { name: "John Doe", zipcode: { $gt: "63000" } }
    ).hint( { zipcode: 1 } ).explain("executionStats")

    db.collection.totalIndexSize()  查看索引的大小,物理内存最好大于它

    9.存贮引擎
    WiredTiger is the default storage engine starting in MongoDB 3.2. It is well-suited for most workloads and is recommended for new deployments. WiredTiger provides a document-level concurrency model, checkpointing, and compression, among other features. In MongoDB Enterprise, WiredTiger also supports Encryption at Rest.
    大多数情况下使用wiredTiger引擎有更好的表现,而且它支持文档锁(document-level)

    MMAPv1 is the original MongoDB storage engine and is the default storage engine for MongoDB versions before 3.2. It performs well on workloads with high volumes of reads and writes, as well as in-place updates.
    Mmapv1引擎在大规模高并发读写方面,性能更好 但还是不建议使用,它的故障恢复功能较弱,只有通过日志和备份恢复。利用多核CPU方面也做的不好。

    The In-Memory Storage Engine is available in MongoDB Enterprise. Rather than storing documents on-disk, it retains them in-memory for more predictable data latencies.
    In-Memory引擎只有企业版才有,使用内存,速度更快,单对内存要求很高。

    WiredTiger引擎
    Document Level Concurrency   文档级别的并行性,mmapv1是集合级别的并行性
    Snapshots and Checkpoints     快照和检查点能在mongodb cash的时候快速恢复
    Journal(write-ahead transaction log) 日志,有助于快速恢复
    Compression                 支持对集合和文档的压缩
    Memory Use                 60% RAM或者最小1GB  使用率更高,高效用多核CUP

    In-Memory引擎
    the in-memory storage engine does not maintain any on-disk data, including configuration data, indexes, user credentials, etc.
        所有数据都放内存中,风险很高,适合于特别要求性能,且对宕机后可快速恢复的数据,如省市区信息,这种不经常写入的数据。
        mongod --storageEngine inMemory --dbpath <path>  启用In-Memory引擎
        Warming: The in-memory storage engine does not persist data after process shutdown.
    recovery of in-memory data is impossible
        官网有个大大的提醒:内存引擎是不持久化数据的,就是一旦宕机了,就啥也没了。
    Memory Use      50% RAM 或者最小1GB
       
    GridFS  适用于大于16M的文件,如果小于,官网建议使用document
    When to use  什么时候用
    If your filesystem limits the number of files in a directory
    当文件系统限制文件数量的时候
    When you want to access information from portions of large files without having to load whole files into memory
    当仅仅使用部分文件的内容时
    When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities
    希望保证文件同步的时候

    chunks stores the binary chunks.  chunks集合存贮文件的二进制块
    files stores the file’s metadata.    files集合存贮文件的描述信息

    10.安全
    创建用户
    use reporting
    db.createUser(
      {
        user: "reportsUser",
        pwd: "12345678",
        roles: [
           { role: "read", db: "reporting" },   //可以对reporting读
           { role: "read", db: "products" },    //可以对products读
           { role: "read", db: "sales" },       //可以对sales读
           { role: "readWrite", db: "accounts" }  //可以对accounts读写
        ]
      }
    )
    创建角色
    use admin
    db.createRole(
       {
         role: "mongostatRole",    //角色名
         privileges: [             //权限
           { resource: { cluster: true }, actions: [ "serverStatus" ] }
         ],
         roles: []    //继承哪些角色
       }
    )
    分配角色
    use reporting
    db.grantRolesToUser(
        "reportsUser",
        [
          { role: "read", db: "accounts" }
        ]
    )

    db.getRole( "read", { showPrivileges: true } ) 查看角色权限
    Mongodb内建角色
    Read    provides the ability to read data on all non-system collections
    读      提供除了系统表之外其他表的读取功能
    readWrite  provides all the privileges of the read role and the ability to modify data on all non-system collections
       读写    提供除系统表之外其他表的读写功能
       详细资料请参考 https://docs.mongodb.com/manual/core/security-built-in-roles/
    需要注意的一点:
    a role can only include privileges that apply to its database and can only inherit from other roles in its database,except for database admin.
    意思是,创建的角色只对当前数据库有效,除了在admin数据库中创建的角色。Admin中创建的角色可以对其他数据库进行操作
    Ensure that the HTTP status interface, the REST API, and the JSON API are all disabled in production environments to prevent potential data exposure and vulnerability to attackers.
    生产环境的时候请确保关闭http,rest api,json api 功能。

    使用bing_ip限制访问的ip
    关于相关权限处理的方法请参考 https://docs.mongodb.com/manual/reference/security/
    11.集群
    A replica set in MongoDB is a group of mongod processes that maintain the same data set
    The primary node receives all write operations。
    replica set 能够集合多个mongod进程为同一份数据服务,且只有主节点才能写。
    The secondaries replicate the primary’s oplog and apply the operations to their data sets such that the secondaries’ data sets reflect the primary’s data set
    子节点依赖于主节点的日志进行数据同步。
    Replica set members send heartbeats (pings) to each other every two seconds. If a heartbeat does not return within 10 seconds, the other members mark the delinquent member as inaccessible
    集群中各个节点彼此发送ping消息,每2秒钟一次。如果某个节点10秒内未能应答,则集群就会标识这个节点无法连接。
    The purpose of an arbiter is to maintain a quorum in a replica set by responding to heartbeat and election requests by other replica set members
    arbiter 的作用是检测集群的健康状况,并进行选举。
    If your replica set has an even number of members, add an arbiter to obtain a majority of votes in an election for primary
    建议是,当集群数量很多的时候,单独一台机子安装arbiter,专门负责集群的选举。
    When a primary does not communicate with the other members of the set for more than 10 seconds, an eligible secondary will hold an election to elect itself the new primary
    当子节点链接主节点时间超过10秒,子节点会认为,主节点crash了,然后子节点对集群发出选举,提名自己成为新的主节点。
    Primary : receives all write operations.   
    Seconderies: replicate operations from the primary to maintain an identical data set
    所有写均要经过主节点,所有子节点均是主节点的一个备份。
    A replica set can have up to 50 members but only 7 voting members
    Replica set 目前只允许最多50个节点,允许最多有7个投票节点。

    You can configure a secondary member for a specific purpose
    出于各种目的,你可以随意设置子节点。 
    Prevent it from becoming a primary in an election, which allows it to reside in a secondary data center or to serve as a cold standby. See Priority 0 Replica Set Members.
    使子节点成为一个冷备份,可以设置priority:0,这样,子节点就不能参与选举了
    Prevent applications from reading from it, which allows it to run applications that require separation from normal traffic. See Hidden Replica Set Members.
    使子节点成为隐藏节点,可以在集群故障的时候快速替换
    Keep a running “historical” snapshot for use in recovery from certain errors, such as unintentionally deleted databases. See Delayed Replica Set Members.
    使子节点成为延迟备份,可以在集群人为误操作后快速还原到前半小时或一小时的状态,延迟时间可以配置,默认是一小时。

    If your deployment requires more than 50 members, you’ll need to use master-slave replication. However, master-slave replication lacks the automatic failover capabilities.
    注意:如果数据过多,需要的集群规模很大,超过50个节点,那么,repleca set目前来说,不适合,你需要转换成master-slave模式(最简单的集群,缺乏故障自动恢复能力)

    Priority 0 Replica Set Members      设置priority的目的
      A priority 0 member is a secondary that cannot become primary
      A priority 0 member can function as a standby
         in sets with varied hardware or geographic distribution, a priority 0 standby ensures that only qualified members become primary
       Hidden Replica Set Members       设置隐藏节点的目的
         A hidden member maintains a copy of the primary’s data set but is invisible to client applications
         Use hidden members for dedicated tasks such as reporting and backups
       Delayed Replica Set Members       设置延迟节点的目的
         a delayed member’s data set reflects an earlier, or delayed, state of the set
         Must be priority 0 members. Set the priority to 0 to prevent a delayed member from becoming primary.
    Should be hidden members. Always prevent applications from seeing and querying delayed members
      
       Oplog      日志
         The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases.
         记录所有的操作,可以通过它来恢复数据。
         The default oplog size depends on the storage engine:
    Engine   Default Size    Lower Bound       Upper Bound
    In-Memory Storage Engine 5% of physical memory 50 MB 50 GB
    WiredTiger Storage Engine 5% of free disk space 990 MB 50 GB
    MMAPv1 Storage Engine 5% of free disk space 990 MB 50 GB
    默认大小
    引擎        大小         下限        上限
    In-Memory   5%内存       50M        50G
          wiredTiger   5%剩余磁盘   990M       50G
          MMAPv1    5%剩余磁盘   990M       50G

      Master-Slave   主从集群,操作简单,没有故障自动恢复功能,不做重点讲解
    mongod --master --dbpath /data/masterdb/ --oplogSize 1024   启动master
    mongod --slave --source <masterhostname><:<port>> --dbpath /data/slavedb/  启动slave
    注意,日志一定要足够大,原因在于,master节点接受写要求,数据同时记录到日志中,然后,master节点把日志分发给各个slave节点,slave节点执行日志的内容,以达到数据同步。一旦大规模写,而日志容量太小就有可能造成,主从不同步,数据不一致的情况。
    rs.printReplicationInfo(),rs.printSlaveReplicationInfo()  查看主节点状态,从节点状态
    更多方法请参考https://docs.mongodb.com/manual/reference/replication/

    mongod --replSet "rs0"
    rs.initiate()   //初始化
    rs.conf()     //查看配置
    rs.add("mongodb1.example.net")   //添加节点
    rs.status()   //查看状态
    以上所有都必须在主节点上进行

    修改节点优先级
    cfg = rs.conf()
    cfg.members[2].priority = 0.5   //修改节点的优先级
    rs.reconfig(cfg)

    添加arbiter节点
    rs.addArb("m1.example.net:30000")

    可用参数
    rs.add({_id: 1, host: "mongodb3.example.net:27017", priority: 0, hidden: true})
    rs.add({_id: 1, host: "mongodb3.example.net:27017", priority: 0, hidden: true, slaveDelay:3600})
    {
      _id: <string>,
      version: <int>,
      protocolVersion: <number>,
      members: [
        {
          _id: <int>,
          host: <string>,
          arbiterOnly: <boolean>,
          buildIndexes: <boolean>,
          hidden: <boolean>,
          priority: <number>,
          tags: <document>,
          slaveDelay: <int>,
          votes: <number>
        },
        ...
      ],
      settings: {
        chainingAllowed : <boolean>,
        heartbeatIntervalMillis : <int>,
        heartbeatTimeoutSecs: <int>,
        electionTimeoutMillis : <int>,
        getLastErrorModes : <document>,
        getLastErrorDefaults : <document>
      }
    }

    移除已有节点
    rs.remove("mongod3.example.net:27017")
    对应节点关闭mongod进程 db.shutdownServer()

    12.分片
    Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
    Sharding就是将大规模数据分片,每个分片各自占用一个mongod进程,方便处理。ar

    Sharding系统包含以下几个进程
    Shard: Each shard contains a subset of the sharded data
          每个分片包含整个数据的一小部分
    Mongos: The mongos acts as a query router
          起到路由的作用
    Config Server: Config servers store metadata and configuration settings for the cluster
          存贮分片集群的配置信息

    分片好处:分布式读写,大规模存贮,高可用性

    For queries that include the shard key or the prefix of a compound shard key, mongos can target the query at a specific shard or set of shards
    对于前缀带shard key的复合查询可以快速定位到某个分片,无需扫描整个数据。
    这里点出了一个性能优化要点:分片集群的增删改查尽量使用shard key
    For queries that do not include the shard key or the prefix of a compound shard key, mongos performs a broadcast operation, querying all shards in the sharded cluster
    对于不包含shard key的查询,就需要扫描所有分片,然后再聚合起来,影响速度。

    In production environments, individual shards should be deployed as replica sets
    Mongodb3.2版本提供shard的replica,在商用上,应该设置分片的备份。

    You cannot change the shard key after sharding, nor can you unshard a sharded collection
    注意:以key=x分片之后就不能在修改为key=y了,集群分片之后也无法还原为未分片前的状态了。

    分片策略    优点                     缺点                   适用
      Hash      数据均匀分布             顺序查询慢             基本适用
      Range     可针对某个阶段分片       数据容易出现头重脚轻   按阶段查询


                                   Hash sharp key


                                  Range sharp key

    Sharp key 限制
    You cannot select a different shard key for that collection.
    一旦选定sharp key就不能再更改
    You cannot update the values of the shard key fields.
    不允许更新sharp key
    You cannot shard a collection that has unique indexes on other fields.
    有除sharp key列外其他列的唯一索引的集合不能分片
    You cannot create unique indexes on other fields for a sharded collection
    分片集群不能创建其他列的唯一索引

         sh.shardCollection( "database.collection", { <field> : "hashed" } ) 创建hash sharp key

         创建Hash分片集群
    1.Create the Config Server Replica Set
    1)Start each member of the config server replica set.
               mongod --configsvr --replSet <setname> --dbpath <path>
            2)Connect to one of the config servers
               mongo --host <hostname> --port <port>
            3)initiates the replica set
               rs.initiate(
                  {
                     _id: "<replSetName>",
                     configsvr: true,           注意一定要有这个    
                     members: [
                       { _id : 0, host : "cfg1.example.net:27017" },
                       { _id : 1, host : "cfg2.example.net:27017" },
                       { _id : 2, host : "cfg3.example.net:27017" }
                     ]
                   }
    )
    2.Create the Shard Replica Sets
    1)Start each member of the shard replica set.
    mongod --shardsvr --replSet <replSetname>
    2)Connect to a member of the shard replica set
    mongo --host <hostname> --port <port>
    3)Initiate the replica set.
                rs.initiate(
                  {
                     _id: "<replSetName>",
                     configsvr: true,
                     members: [
                       { _id : 0, host : "cfg1.example.net:27017" },
                       { _id : 1, host : "cfg2.example.net:27017" },
                       { _id : 2, host : "cfg3.example.net:27017" }
                     ]
                   }
    )
    3.Connect a mongos to the Sharded Cluster
    1)Connect a mongos to the cluster
    mongos --configdb <configReplSetName>/host1:port,host2:port,……
    2)Connect to the mongos
    mongo --host <hostname> --port <port> 
    4.Add Shards to the Cluster
    sh.addShard( "<replSetName>/host:port ")
    5.Enable Sharding for a Database
    sh.enableSharding("<database>")
    6.Shard a Collection using Hashed Sharding
    If the collection already contains data, you must create a Hashed Indexes on the shard key using the db.collection.createIndex() method before using shardCollection().

    sh.shardCollection("<database>.<collection>", { <key> : <direction> } )

    创建Range分片集群
       步骤和创建Hash分片集群一致,除了第6步,Range分片创建的是普通索引,而Hash分片创建的是hash索引

    相关分片命令参考https://docs.mongodb.com/manual/reference/sharding/
    13.性能测试
    测试前期准备已经完毕,由于网络的原因,目前机子不可用,等网络通了,就可以进一步完善了。

  • 相关阅读:
    (4.7)怎么捕获和记录SQL Server中发生的死锁?
    SQLSERVER排查CPU占用高的情况
    (4.6)sql server索引缺失提示
    (4.14)向上取整、向下取整、四舍五入取整的实例
    mysql大致学习路径
    (2)linux未使用eth0,未使用IPV4导致无法连接
    (4.13)sql server参数嗅探(parameter sniffing)
    完美女人
    关于box-sizing
    什么是担当
  • 原文地址:https://www.cnblogs.com/burningmyself/p/7451406.html
Copyright © 2011-2022 走看看