Mongodb 索引 - 走看看

zoukankan html css js c++ java

Mongodb 索引

MongoDB 索引
mongodb的索引是B树，两种特点，1 方便各种查询（精确匹配，范围条件查询，排序，前缀匹配，索引查询。2 在index进行dml操作后，会保持平衡
单键索引，每个索引入口对应文档索引里的单个值
复合索引，前缀字段的顺序很重要，类似于mysql的复合索引
索引与存储引擎
MMAPv1，使用mmap()方法，mongodb告诉操作系统把所有的数据文件映射到内存中
WiredTiger,包含所有文档，集合和索引的数据文件，被操作系统加载和移除内存，按照4KB大小移动（内存页），有点类似mysql，按照16kb的页移动
不管什么时候进行dml操作，都是被OS异步写入磁盘，都是先修改内存中的数据页
要确保索引加载到内存中，所以索引不是越多越好
索引单独存在内存里，无法聚集。在聚集索引，索引的顺序和数据一一对应，mysql的主键就是聚集索引
b树的节点通常会保持60%的饱和度

db.tt.getIndexes()##查看index
db.tt.totalIndexSize()##查看index大小
db.tt.count() ##查看数据条数
db.tt.dataSize(); ##查看数据大小
db.tt.getDB() ##获取集合的数据库名
db.tt.stats() ##集合的状态

唯一索引{unique:true}//insert并不检查文档是否插入过了，插入唯一键的文档，注意错误提示。
MyMongo:PRIMARY> db.tt.createIndex({name:1},{unique:true})
稀疏索引，索引默认是密集型,db.products.find({category:null}),这样的查询也能使用索引
用处，比如允许匿名用户浏览评价电商网站，记录里面可能大部分用户的user_id是null的，如果对这个字段进行索引，有一半的入口是null，这时候可以用稀疏索引
MyMongo:PRIMARY> db.tt.createIndex({name:1},{unique:true,sparse:true})
MyMongo:PRIMARY> db.tt.createIndex({user_id:1},{sparse:true,unique:false})
多键索引，索引字段为数组的例子
{
name:"wheelbarrow",
tags:["tools","gardening","soil"]}
在tags上创建索引，每个文档tags数组的值会出现在索引里
哈希索引，入口通过哈希函数来确定
MyMongo:PRIMARY> db.tt.createIndex({name:'hashed')
//查询限制，用于等值查询，不支持范围查询，不支持多键哈希，浮点数在哈希之前转换为整数
地理空间索引，查询某个地点附近的文档，基于经纬度来存储每个文档
db.map.crateIndex({"gps":"2d"})//参数2d，gps的键值必须是一个键值对{"gps":[1,100]}/{"gps":{"x":-20,"y":30}}
db.map.find({"gps":{"$near":[40,-73]}}).limit(10)
db.runCommand{geoNear:"map",near:[40,-73],num:10})
mongo不但能找到靠近一个点的文档，还能找到指定形状内的文档"$within"
复合地理空间索引
db.crateIndex({"location":"2d","desc":1})
db.map.find({"location":{"$near":[-70,30]},"desc":"coffeeshop"}).limit(1)
mongo的地理空间索引假设索引内存是在一个平面上的。

内嵌文档索引
db.blog.createIndex({"comments.date":-1})
为排序创建索引
对没有索引键使用sort进行排序，mongo将所有的数据提取到内存中排序。

索引是特殊的数据结构，索引存储在一个易于遍历读取的数据集合中，索引是对数据库表中一列或多列的值进行排序的一种结构
MyMongo:PRIMARY> db.tt.createIndex({name:1},{unique:true})
MyMongo:PRIMARY> db.tt.createIndex({name:1})
MyMongo:PRIMARY> db.tt.getIndexes()
MyMongo:PRIMARY> db.tt.createIndex({name:1},{unique:true,dropDups:true}) //3.*以上版本，没有参数dropDups
"ok" : 0,
"errmsg" : "E11000 duplicate key error collection: test.tt index: name_1 dup key: { : "test0" }"
//遇到重复的值，mongo自行删除，dropDups在3.x中废弃，可以创建新的集合并创建唯一索引，进行导入，并忽略错误

ensureIndex() 方法，在3.*被createIndex替代，后者创建新索引的文档并存储到专门的system.indexes集合里
>db.test_collection.ensureIndex({"age":1})
>db.col.ensureIndex({"title":1,"description":-1})
ensureIndex() 接收可选参数，可选参数列表如下：
Parameter Type Description
background Boolean 建索引过程会阻塞其它数据库操作，background可指定以后台方式创建索引，即增加 "background" 可选参数。 "background" 默认值为false。
unique Boolean 建立的索引是否唯一。指定为true创建唯一索引。默认值为false.
name string 索引的名称。如果未指定，MongoDB的通过连接索引的字段名和排序顺序生成一个索引名称。
dropDups Boolean 在建立唯一索引时是否删除重复记录,指定 true 创建唯一索引。默认值为 false.
sparse Boolean 对文档中不存在的字段数据不启用索引；这个参数需要特别注意，如果设置为true的话，在索引字段中不会查询出不包含对应字段的文档.。默认值为 false.
weights document 索引权重值，数值在 1 到 99,999 之间，表示该索引相对于其他索引字段的得分权重。
db.values.ensureIndex({open: 1, close: 1}, {background: true})
MyMongo:PRIMARY> db.tt.dropIndex("name_1") //删除index
构建索引
db.values.createIndex({open:1,close:1})
后台创建索引,创建可能会有点慢，数据库能正常访问
db.values.createIndex({open:1,close:1},{background:true})
离线索引，离线复制一个新的服务器节点，然后在此节点上创建索引，并且允许此服务器复制主服务器数据，更新完毕，就把此设置为主服务器
备份索引，不是所有的备份方式都能备份索引，mongodump/mongorestore只会保存集合和索引，在mongorestore时，所有集合定义的索引只要
备份过，都会重新创建恢复
碎片整理，对数据库进行大量的更新和删除，可能会产生许多索引碎片，b树会自己调整一些空间，可以考虑重建索引

MyMongo:PRIMARY> db.tt.reIndex() //重建期间有写入锁，mongo实例不可用，重建最好脱机进行

索引的元信息存在在每个db的system.indexes集合中，是一个保留集合，不能对其插入或者删除，只能create/drop
system.namespaces集合也有索引的名字
-----mongodb index

--在下官方文档
file:///H:/word/mongodb/html/indexes.html

mongodb在collection级别定义索引，跟rdbms差不多

Default _id Index
MongoDB creates a unique index on the _id field during the creation of a collection.
Create an Index//MongoDB indexes use a B-tree data structure.
　　db.collection.createIndex( <key and index type specification>, <options> )
　　db.collection.createIndex( { name: -1 } )
　　db.myColl.createIndex( { category: 1 }, { collation: { locale: "fr" } } )
　　db.myColl.find( { category: "cafe" } ).collation( { locale: "fr" } )
　　db.myColl.find( { category: "cafe" } )// cannot use the index
　　db.myColl.createIndex(
　　{ score: 1, price: 1, category: 1 },
　　{ collation: { locale: "fr" } } )
　　db.myColl.find( { score: 5 } ).sort( { price: 1 } )
　　db.myColl.find( { score: 5, price: { $gt: NumberDecimal( "10" ) } } ).sort( { price: 1 } )
　　db.myColl.find( { score: 5, category: "cafe" } )
　　The following indexes only support simple binary comparison and do not support collation:
text indexes,
2d indexes, and
geoHaystack indexes.
覆盖查询，当查询条件和查询的投影只包括索引字段时，MongoDB会直接从索引中返回结果，而不需要扫描任何文档或将文档带入内存中。这些被覆盖的查询可能非常有效。
Index Intersection //交叉索引

Index Types
Single Field
db.records.createIndex( { score: 1 } )
db.records.find( { score: 2 } )
db.records.find( { score: { $gt: 10 } } )
Create an Index on an Embedded Field
{
"_id": ObjectId("570c04a4ad233577f97dc459"),
"score": 1034,
"location": { state: "NY", city: "New York" }
}
db.records.createIndex( { "location.state": 1 } ) //创建嵌套字段索引
db.records.find( { "location.state": "CA" } )
db.records.find( { "location.city": "Albany", "location.state": "NY" } )
Create an Index on Embedded Document // 创建嵌套文档索引
db.records.createIndex( { location: 1 } )
Compound Index
{ userid: 1, score: -1 }
{
"_id": ObjectId(...),
"item": "Banana",
"category": ["food", "produce", "grocery"],
"location": "4th Street Store",
"stock": 4,
"type": "cases"
}
db.products.createIndex( { "item": 1, "stock": 1 } )
db.products.find( { item: "Banana" } )
db.products.find( { item: "Banana", stock: { $gt: 5 } } )
db.events.find().sort( { username: 1, date: -1 } )
db.events.find().sort( { username: -1, date: 1 } )
db.events.createIndex( { "username" : 1, "date" : -1 } ) //上面2个查询，支持，要么全部相同，要么全部相反
db.events.find().sort( { username: 1, date: 1 } )//上面创建的index，不支持，
{ "item": 1, "location": 1, "stock": 1 }//索引前缀
Multikey Index
{"addr.zip":1}
Geospatial Index //空间地理索引，支持对地理空间坐标的index
2d indexes that uses planar geometry when returning results and 2dsphere indexes that use spherical geometry to return results.
Text Indexes
Hashed Indexes
Index Properties
Unique Indexes
Partial Indexes //局部索引，部分索引只对满足特定筛选器表达式的集合中的文档进行索引。
Sparse Indexes //稀疏索引，索引的稀疏属性确保索引只包含具有索引字段的文档的条目。索引跳过那些没有索引字段的文档。
您可以将稀疏的索引选项与惟一索引选项组合起来，以拒绝为字段具有重复值的文档，但忽略没有索引键的文档。
TTL Indexes //TTL索引是一种特殊的索引，MongoDB可以使用它在一定时间后自动从集合中删除文档。

MongoDB 覆盖索引查询
所有的查询字段是索引的一部分
所有的查询返回字段在同一个索引中
由于所有出现在查询中的字段是索引的一部分， MongoDB 无需在整个数据文档中检索匹配查询条件和返回使用相同索引的查询结果。
因为索引存在于RAM中，从索引中获取数据比通过扫描文档读取数据要快得多。
{
"_id": ObjectId("53402597d852426020000002"),
"contact": "987654321",
"dob": "01-01-1991",
"gender": "M",
"name": "Tom Benzamin",
"user_name": "tombenzamin"
}
>db.users.ensureIndex({gender:1,user_name:1})
>db.users.find({gender:"M"},{user_name:1,_id:0})
最后，如果是以下的查询，不能使用覆盖索引查询：
所有索引字段是一个数组
所有索引字段是一个子文档

找出慢查询
[root@mysqlt1 soft]# /usr/local/mongodb/bin/mongorestore -d stocks dump/stocks/ --host=10.15.7.114 --port=28004
MyMongo:PRIMARY> use stocks
MyMongo:PRIMARY> db.values.find({"stock_symbol":"GOOG"}).sort({date:-1}).limit(1)
log 日志找出
[root@mysqlt1 ~]# tail -f -n 200 /data/mongodb/log/28004.log
2018-10-18T09:03:50.276+0800 I COMMAND [conn27] command stocks.values appName: "MongoDB Shell" command: find { find: "values", filter: { stock_symbol: "GOOG" }, limit: 1.0,
singleBatch: false, sort: { date: -1.0 }, $clusterTime: { clusterTime: Timestamp(1539824574, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000),
keyId: 0 } }, $readPreference: { mode: "secondaryPreferred" }, $db: "stocks" } planSummary: COLLSCAN keysExamined:0 docsExamined:4308303 hasSortStage:1 cursorExhausted:
1 numYields:33723 nreturned:1 reslen:388 locks:{ Global: { acquireCount: { r: 67448 } }, Database: { acquireCount: { r: 33724 } }, Collection: { acquireCount: { r: 33724 }
} } protocol:op_msg 5824ms
[root@mysqlt1 ~]# grep -E '[0-9]+ms' /data/mongodb/log/28004.log
MyMongo:PRIMARY> db.setProfilingLevel(2)//使用对慢查询的内置profile分析器，2为最详细级别，记录每个读写操作
MyMongo:PRIMARY> db.setProfilingLevel(1, 50)//记录慢操作耗时超过50ms，级别为1
MyMongo:PRIMARY> db.values.find({}).sort({close:-1}).limit(1)
MyMongo:PRIMARY> db.system.profile.find({millis:{$gt:150}})//查询耗时超过150ms的操作
MyMongo:PRIMARY> db.system.profile.find().sort({$natural:-1}).limit(5).pretty()//将慢查询放入system.profile集合，使用$natural来排序
MyMongo:PRIMARY> db.setProfilingLevel(0)//禁用分析器，设置为0

解决慢查询，添加索引，可能要重新组织索引，重构数据模型，或者升级硬件
最简单的情况下，有些情况下，缺少索引，不恰当的索引低于期望的查询都会遇到问题
MongoDB 查询分析可以确保我们建议的索引是否有效，是查询语句性能分析的重要工具。
MongoDB 查询分析常用函数有：explain() 和 hint()。
MyMongo:PRIMARY> db.values.find({}).sort({close:-1}).limit(1).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "stocks.values",
"indexFilterSet" : false,
"parsedQuery" : {

},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"close" : -1
},
"limitAmount" : 1,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "COLLSCAN",
"direction" : "forward"
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "mysqlt1",
"port" : 28004,
"version" : "3.6.3",
"gitVersion" : "9586e557d54ef70f9ca4b43c26892cd55257e1a5"
},
"ok" : 1,
"operationTime" : Timestamp(1539887460, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1539887460, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
MyMongo:PRIMARY> db.values.count()
4308303
MyMongo:PRIMARY> db.values.getIndexes()
MyMongo:PRIMARY> db.values.createIndex({close:1}) //添加索引
MyMongo:PRIMARY> db.values.find({}).sort({close:1}).limit(1).explain()
MyMongo:PRIMARY> db.values.find({close:{$gt:500}}).explain()
MyMongo:PRIMARY> db.values.createIndex({stock_symbol:1,close:1}) //添加索引，索引的key不存在也不报错。。
MyMongo:PRIMARY> db.values.find({stock_symbol:"GOOG",close:{$gt:200}}).explain(true)
MyMongo:PRIMARY> db.values.find({stock_symbol:"GOOG",close:{$gt:200}}).count()
MyMongo:PRIMARY> db.values.find({stock_symbol:"GOOG",close:{$gt:200}}).hint({close:1}).limit(10)
MyMongo:PRIMARY> db.values.find({stock_symbol:"GOOG",close:{$gt:200}}).limit(10)
复合索引--前缀原则
MyMongo:PRIMARY> db.values.createIndex({close:1,open:1,date:1})
MyMongo:PRIMARY> db.values.find({close:1,open:1,date:"1985-01-08"}).limit(10).explain() //精确匹配
MyMongo:PRIMARY> db.values.find({close:1,open:1).limit(10).explain()
MyMongo:PRIMARY> db.values.find({close:1).limit(10).explain()
MyMongo:PRIMARY> db.values.find({}).sort({close:1}).limit(10)
MyMongo:PRIMARY> db.values.find({close:{$gt:1}}).limit(10)
MyMongo:PRIMARY> db.values.find({close:100}).sort({open:1}).limit(10)
MyMongo:PRIMARY> db.values.find({close:100,open:{$gt:1}}).limit(10)
MyMongo:PRIMARY> db.values.find({close:1,open:1.01,date:{$ge:"2005-01-01"}}).limit(10)
MyMongo:PRIMARY> db.values.find({close:1,open:1.01}).sort({date:1}).limit(10)

使用 hint()
虽然MongoDB查询优化器一般工作的很不错，但是也可以使用 hint 来强制 MongoDB 使用一个指定的索引
这种方法某些情形下会提升性能。一个有索引的 collection 并且执行一个多字段的查询(一些字段已经索引了)。
如下查询实例指定了使用 gender 和 user_name 索引字段来查询：
>db.users.find({gender:"M"},{user_name:1,_id:0}).hint({gender:1,user_name:1})
可以使用 explain() 函数来分析以上查询：
>db.users.find({gender:"M"},{user_name:1,_id:0}).hint({gender:1,user_name:1}).explain()
MyMongo:PRIMARY> db.values.find({stock_symbol:"GOOG",close:{$gt:200}}).hint({close:1}).explain()
MyMongo:PRIMARY> db.values.find({close:100}).limit(10)
MyMongo:PRIMARY> db.values.find({close:{$gte:100}}).limit(10)
MyMongo:PRIMARY> db.values.find({close:{$gte:100}}).sort({close:1}).limit(10)

查询计划缓存，查询优化器如何缓存和过期查询计划
当找个一个成功的计划后，查询模式，nscanned的值及指定的索引就会被记录下来
过期：集合写入1000次，新建或者删除索引，使用查询计划做了比期望更多的工作（nscanned超过缓存的值10）
MyMongo:PRIMARY> db.values.find({}).sort({close:1}).limit(10)

explain 操作提供了查询信息，使用索引及查询统计等。有利于我们对索引的优化。
现版本explain有三种模式，分别如下
queryPlanner
executionStats
allPlansExecution

接下来我们在 users 集合中创建 gender 和 user_name 的索引：
>db.users.ensureIndex({gender:1,user_name:1})
>db.users.find({gender:"M"},{user_name:1,_id:0}).explain()
以上的 explain() 查询返回如下结果：
"cursor" : "BtreeCursor gender_1_user_name_1",
"indexOnly" : true,
cursor ：因为这个查询使用了索引，MongoDB 中索引存储在B树结构中，所以这是也使用了 BtreeCursor 类型的游标。
如果没有使用索引，游标的类型是 BasicCursor。这个键还会给出你所使用的索引的名称，你通过这个名称可以查看当
前数据库下的system.indexes集合（系统自动创建，由于存储索引信息，这个稍微会提到）来得到索引的详细信息。
n：当前查询返回的文档数量
nscanned/nscannedObjects：表明当前这次查询一共扫描了集合中多少个文档，我们的目的是，让这个数值和返回文档的数量越接近越好。
millis ：当前查询所需时间，毫秒数
indexBounds ：当前index的范围

db.mycol.insert({
"item": "Banana",
"category": ["food", "produce", "grocery"],
"location": "4th Street Store",
"stock": 4,
"type": "cases"
})
db.mycol.createIndex( { location: 1 } )
db.mycol.dropIndex( { location: 1 } )
db.mycol.createIndex( { "item": 1, "stock": 1 } )
db.mycol.find( { item: "Banana" } )
db.mycol.find( { item: "Banana", stock: { $gt: 5 } } )
MyMongo:PRIMARY> db.mycol.find( { item: "Banana" } ).explain()
{
"queryPlanner" : {
"plannerVersion" : 1, //查询计划版本
"namespace" : "test.mycol", //被查询对象
"indexFilterSet" : false,//是否使用到了索引来过滤
"parsedQuery" : {//解析查询，即过滤条件是什么
"item" : {
"$eq" : "Banana"
}
},
"winningPlan" : {//最佳的执行计划
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"item" : 1,
"stock" : 1
},
"indexName" : "item_1_stock_1", //索引名字
"isMultiKey" : false, //是否是多键
"multiKeyPaths" : {
"item" : [ ],
"stock" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"item" : [
"["Banana", "Banana"]"
],
"stock" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [ ] //拒绝的执行计划
},
"serverInfo" : {
"host" : "hongquan1",
"port" : 28001,
"version" : "3.6.3",
"gitVersion" : "9586e557d54ef70f9ca4b43c26892cd55257e1a5"
},
"ok" : 1,
"operationTime" : Timestamp(1521573084, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1521573084, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
观察执行统计信息
db.mycol.find( { item: "Banana" } ).explain("executionStats")
},
"executionStats" : { //执行统计信息
"executionSuccess" : true,//执行成功的状态
"nReturned" : 1,//返回结果集数目
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"item" : 1,
"stock" : 1
},
"indexName" : "item_1_stock_1",
获取所有的执行计划
db.mycol.find( { item: "Banana" } ).explain("allPlansExecution")

explian 分析

executionStages中的type类型：

COLLSCAN：全表扫描
IXSCAN：索引扫描
FETCH：根据索引去检索指定document
SHARD_MERGE：将各个分片返回数据进行merge
SORT：表明在内存中进行了排序
LIMIT：使用limit限制返回数
SKIP：使用skip进行跳过
IDHACK：针对_id进行查询
SHARDING_FILTER：通过mongos对分片数据进行查询
COUNT：利用db.coll.explain().count()之类进行count运算
COUNTSCAN：count不使用Index进行count时的stage返回
COUNT_SCAN：count使用了Index进行count时的stage返回
SUBPLA：未使用到索引的$or查询的stage返回
TEXT：使用全文索引进行查询时候的stage返回
PROJECTION：限定返回字段时候stage的返回

查看全文

相关阅读:
第五章课后练习题
 第四章课后练习
 函数
 变量、常量及类型
 go环境搭建及编辑器安装
 Matplotlib（绘图和可视化）
Pandas例题(以NBA球队为例)
Pandas
Numpy
jupyter notebook编辑器的用法

原文地址：https://www.cnblogs.com/yhq1314/p/10007874.html