MongoDB 聚合Group(二)

zoukankan html css js c++ java

MongoDB 聚合Group(二)
转自：https://blog.csdn.net/congcong68/article/details/51620040

一、简介：

    上一篇我们对   db.collection.aggregate(pipeline, options)介绍，我们接下来介绍pipeline 参数和options参数的基础认识

  【pipeline 参数】

pipeline 类型是Array 语法：db.collection.aggregate( [ { <stage> }, ... ] )

$project：可以对输入文档进行添加新字段或删除现有的字段，可以自定哪些字段显示与不显示。

$match ：根据条件用于过滤数据，只输出符合条件的文档，如果放在pipeline前面，根据条件过滤数据，传输到下一个阶段管道，可以提高后续的数据处理效率。还可以放在out之前，对结果进行再一次过滤。

$redact ：字段所处的document结构的级别.

$limit ：用来限制MongoDB聚合管道返回的文档数

$skip :在聚合管道中跳过指定数量的文档，并返回余下的文档。

$unwind :将文档中的某一个数组类型字段拆分成多条，每条包含数组中的一个值。

$sample :随机选择从其输入指定数量的文档。如果是大于或等于5%的collection的文档，$sample进行收集扫描，进行排序，然后选择顶部的文件。因此，$sample 在收集阶段是受排序的内存限制。

语法： { $sample: { size: <positive integer> } }

$sort :将输入文档排序后输出。

$geoNear:用于地理位置数据分析。

$out :必须为pipeline最后一个阶段管道，因为是将最后计算结果写入到指定的collection中。

$indexStats :返回数据集合的每个索引的使用情况。

语法：{ $indexStats: { } }

$group : 将集合中的文档分组，可用于统计结果，$group首先将数据根据key进行分组。

我们可以通过Aggregation pipeline一些操作跟sql用法一样，我们能很清晰的怎么去使用

pipeline sql

  $project select

  $match where

  $match(先$group阶段管道然后在$match对统计的结果进行再一次的过滤) having

$sort order by

$limit limit

二、基础的操作

    【pipeline简单例子】

数据：
[sql] view plain copy

db.items.insert( [

  {

   "quantity" : 2,

   "price" : 5.0,

   "pnumber" : "p003",

  },{

   "quantity" : 2,

   "price" : 8.0,

   "pnumber" : "p002"

  },{

   "quantity" : 1,

   "price" : 4.0,

   "pnumber" : "p002"

  },{

   "quantity" : 2,

   "price" : 4.0,

   "pnumber" : "p001"

  },{

   "quantity" : 4,

   "price" : 10.0,

   "pnumber" : "p003"

  },{

   "quantity" : 10,

   "price" : 20.0,

   "pnumber" : "p001"

  },{

   "quantity" : 10,

   "price" : 20.0,

   "pnumber" : "p003"

  },{

   "quantity" : 5,

   "price" : 10.0,

   "pnumber" : "p002"

  }

])
【 $project】

1、我们对上面的统计结果，只显示count,不想_id ,可以通过$project来操作，相当SQL的select 显示我们想要的字段：
[sql] view plain copy

> db.items.aggregate([{$group:{_id:null,count:{$sum:1}}},{$project:{"_id":0,"count":1}}])

{ "count" : 8 }
【 $match】

1、我们想通过滤订单中，想知道卖出的数量大于8的产品有哪些产品，相当于SQL:select sum(quantity) as total from items group by pnumber having total>8
[sql] view plain copy

> db.items.aggregate([{$group:{_id:"$pnumber",total:{$sum:"$quantity"}}},{$match:{total:{$gt:8}}}])

{ "_id" : "p001", "total" : 12 }

{ "_id" : "p003", "total" : 16 }
2、如果是放在$group之前就是当做where来使用，我们只统计pnumber =p001 产品卖出了多少个 select sum(quantity) as total from items where pnumber='p001'
[sql] view plain copy

> db.items.aggregate([{$match:{"pnumber":"p001"}},{$group:{_id:null,total:{$sum:"$quantity"}}}])

{ "_id" : null, "total" : 12 }
【$skip、 $limit、$sort】

db.items.aggregate([{ $skip: 2 },{ $limit: 4 }]) 与db.items.aggregate([{ $limit: 4 },{ $skip: 2 }]) 这样结果是不一样的
[sql] view plain copy

> db.items.aggregate([{ $skip: 2 },{ $limit: 4 }])

{ "_id" : ObjectId("574d937cfafef57ee4427ac4"), "quantity" : 1, "price" : 4, "pnumber" : "p002" }

{ "_id" : ObjectId("574d937cfafef57ee4427ac5"), "quantity" : 2, "price" : 4, "pnumber" : "p001" }

{ "_id" : ObjectId("574d937cfafef57ee4427ac6"), "quantity" : 4, "price" : 10, "pnumber" : "p003" }

{ "_id" : ObjectId("574d937cfafef57ee4427ac7"), "quantity" : 10, "price" : 20, "pnumber" : "p001" }

> db.items.aggregate([{ $limit: 4 },{ $skip: 2 }])

{ "_id" : ObjectId("574d937cfafef57ee4427ac4"), "quantity" : 1, "price" : 4, "pnumber" : "p002" }

{ "_id" : ObjectId("574d937cfafef57ee4427ac5"), "quantity" : 2, "price" : 4, "pnumber" : "p001" }
$limit、$skip、$sort、$match可以使用在阶段管道，如果使用在$group之前可以过滤掉一些数据，提高性能。

【$unwind】

将文档中的某一个数组类型字段拆分成多条，每条包含数组中的一个值。
[sql] view plain copy

> db.items.aggregate([{$group:{_id:"$pnumber",quantitys:{$push:"$quantity"}}}])

{ "_id" : "p001", "quantitys" : [ 2, 10 ] }

{ "_id" : "p002", "quantitys" : [ 2, 1, 5 ] }

{ "_id" : "p003", "quantitys" : [ 2, 4, 10 ] }

> db.items.aggregate([{$group:{_id:"$pnumber",quantitys:{$push:"$quantity"}}},{$unwind:"$quantitys"}])

{ "_id" : "p001", "quantitys" : 2 }

{ "_id" : "p001", "quantitys" : 10 }

{ "_id" : "p002", "quantitys" : 2 }

{ "_id" : "p002", "quantitys" : 1 }

{ "_id" : "p002", "quantitys" : 5 }

{ "_id" : "p003", "quantitys" : 2 }

{ "_id" : "p003", "quantitys" : 4 }

{ "_id" : "p003", "quantitys" : 10 }
【$out】

必须为pipeline最后一个阶段管道，因为是将最后计算结果写入到指定的collection中。
[sql] view plain copy

{ "_id" : "p001", "quantitys" : 2 }

{ "_id" : "p001", "quantitys" : 10 }

{ "_id" : "p002", "quantitys" : 2 }

{ "_id" : "p002", "quantitys" : 1 }

{ "_id" : "p002", "quantitys" : 5 }

{ "_id" : "p003", "quantitys" : 2 }

{ "_id" : "p003", "quantitys" : 4 }

{ "_id" : "p003", "quantitys" : 10 }
把结果放在指定的集合中result
[sql] view plain copy

> db.items.aggregate([{$group:{_id:"$pnumber",quantitys:{$push:"$quantity"}}},{$unwind:"$quantitys"},{$project:{"_id":0,"quantitys":1}},{$out:"result"}])

> db.result.find()

{ "_id" : ObjectId("57529143746e15e8aa207a29"), "quantitys" : 2 }

{ "_id" : ObjectId("57529143746e15e8aa207a2a"), "quantitys" : 10 }

{ "_id" : ObjectId("57529143746e15e8aa207a2b"), "quantitys" : 2 }

{ "_id" : ObjectId("57529143746e15e8aa207a2c"), "quantitys" : 1 }

{ "_id" : ObjectId("57529143746e15e8aa207a2d"), "quantitys" : 5 }

{ "_id" : ObjectId("57529143746e15e8aa207a2e"), "quantitys" : 2 }

{ "_id" : ObjectId("57529143746e15e8aa207a2f"), "quantitys" : 4 }

{ "_id" : ObjectId("57529143746e15e8aa207a30"), "quantitys" : 10 }
【$redact】

语法：{ $redact: <expression> }

$redact 跟$cond结合使用，并在$cond里面使用了if 、then、else表达式，if-else缺一不可，$redact还有三个重要的参数：

1）$$DESCEND：返回包含当前document级别的所有字段，并且会继续判字段包含内嵌文档，内嵌文档的字段也会去判断是否符合条件。

2）$$PRUNE：返回不包含当前文档或者内嵌文档级别的所有字段，不会继续检测此级别的其他字段，即使这些字段的内嵌文档持有相同的访问级别。

3）$$KEEP：返回包含当前文档或内嵌文档级别的所有字段，不再继续检测此级别的其他字段，即使这些字段的内嵌文档中持有不同的访问级别。

1、 level=1则值为为$$DESCEND，否则为$$PRUNE，从顶部开始扫描下去，执行$$DESCEND包含当前document级别的所有fields。当前级别字段的内嵌文档将会被继续检测。
[sql] view plain copy

db.redact.insert(

  {

  _id: 1,

  level: 1,

  status: "A",

  acct_id: "xyz123",

  cc: [{

    level: 1,

    type: "yy",

    num: 000000000000,

    exp_date: ISODate("2015-11-01T00:00:00.000Z"),

    billing_addr: {

      level: 5,

      addr1: "123 ABC Street",

      city: "Some City"

    }

  },{

     level: 3,

    type: "yy",

    num: 000000000000,

    exp_date: ISODate("2015-11-01T00:00:00.000Z"),

    billing_addr: {

      level: 1,

      addr1: "123 ABC Street",

      city: "Some City"

    }

}]

})



db.redact.aggregate(

  [

    { $match: { status: "A" } },

    {

      $redact: {

        $cond: {

          if: { $eq: [ "$level", 1] },

          then: "$$DESCEND",

          else: "$$PRUNE"

        }

      }

    }

  ]

);



{

  "_id" : 1,

  "level" : 1,

  "status" : "A",

  "acct_id" : "xyz123",

  "cc" : [

           { "level" : 1,

     "type" : "yy",

"num" : 0,

"exp_date" : ISODate("2015-11-01T00:00:00Z")

}

   ]

}
2、$$PRUNE：不包含当前文档或者内嵌文档级别的所有字段，不会继续检测此级别的其他字段，即使这些字段的内嵌文档持有相同的访问级别。连等级的字段都不显示，也不会去扫描等级字段包含下级。
[sql] view plain copy

db.redact.insert(

  {

  _id: 1,

  level: 1,

  status: "A",

  acct_id: "xyz123",

  cc: {

    level: 3,

    type: "yy",

    num: 000000000000,

    exp_date: ISODate("2015-11-01T00:00:00.000Z"),

    billing_addr: {

      level: 1,

      addr1: "123 ABC Street",

      city: "Some City"

    }

}

}

)

db.redact.aggregate(

  [

    { $match: { status: "A" } },

    {

      $redact: {

        $cond: {

          if: { $eq: [ "$level", 3] },

          then: "$$PRUNE",

          else: "$$DESCEND"

        }

      }

    }

  ]

);

{ "_id" : 1, "level" : 1, "status" : "A", "acct_id" : "xyz123" }
3、$$KEEP：返回包含当前文档或内嵌文档级别的所有字段，不再继续检测此级别的其他字段，即使这些字段的内嵌文档中持有不同的访问级别。
[sql] view plain copy

db.redact.insert(

  {

  _id: 1,

  level: 1,

  status: "A",

  acct_id: "xyz123",

  cc: {

    level: 2,

    type: "yy",

    num: 000000000000,

    exp_date: ISODate("2015-11-01T00:00:00.000Z"),

    billing_addr: {

      level:3,

      addr1: "123 ABC Street",

      city: "Some City"

    }

}

}

)



db.redact.aggregate(

  [

    { $match: { status: "A" } },

    {

      $redact: {

        $cond: {

          if: { $eq: [ "$level", 1] },

          then: "$$KEEP",

          else: "$$PRUNE"

        }

      }

    }

  ]

);



{ "_id" : 1, "level" : 1, "status" : "A", "acct_id" : "xyz123", "cc" : { "level" : 2, "type" : "yy", "num" : 0, "exp_date" : ISODate("2015-11-01T00:00:00Z"), "billing_addr" : { "level" : 3, "addr1" : "123 ABC Street", "city" : "Some City" } } }
【 options参数】

explain:返回指定aggregate各个阶段管道的执行计划信息。

他操作返回一个游标，包含aggregate执行计划详细信息。
[sql] view plain copy

db.items.aggregate([{$group:{_id:"$pnumber",total:{$sum:"$quantity"}}},{$group:{_id:null,max:{$max:"$total"}}}],{explain:true})





    "stages" : [

            {

                    "$cursor" : {

                            "query" : {



                            },

                            "fields" : {

                                    "pnumber" : 1,

                                    "quantity" : 1,

                                    "_id" : 0

                            },

                            "plan" : {

                                    "cursor" : "BasicCursor",

                                    "isMultiKey" : false,

                                    "scanAndOrder" : false,

                                    "allPlans" : [

                                            {

                                                    "cursor" : "BasicCursor"



                                                    "isMultiKey" : false,

                                                    "scanAndOrder" : false

                                            }

                                    ]

                            }

                    }

            },

            {

                    "$group" : {

                            "_id" : "$pnumber",

                            "total" : {

                                    "$sum" : "$quantity"

                            }

                    }

            },

            {

                    "$group" : {

                            "_id" : {

                                    "$const" : null

                            },

                            "max" : {

                                    "$max" : "$total"

                            }

                    }

            }

    ],

    "ok" : 1
allowDiskUse:每个阶段管道限制为100MB的内存，如果大于100MB的数据可以先写入临时文件。设置为true时，aggregate操作可时可以先将数据写入对应数据目录的子目录中

的唯一并以_tmp结尾的文档中。

cursor:指定游标的初始批批大小。光标的字段的值是一个与场batchSize文件。

语法cursor: { batchSize: <int> }

var cursor=db.items.aggregate([{$group:{_id:"$pnumber",total:{$sum:"$quantity"}}},{ $limit: 2 }],{cursor: { batchSize: 1 }})

版本2.6之后：DB.collect.aggregate()方法返回一个指针，可以返回任何结果集的大小。没有指针时返回所有的结果在一个单一的集合，并将结果集限制为16字节大小的

mongodb shell 设置游标大小cursor.batchSize(size) 一次返回多少条,游标提供了很多方法：

cursor.hasNext()

cursor.next()

cursor.toArray()

cursor.forEach()

cursor.map()

cursor.objsLeftInBatch()

cursor.itcount()

cursor.pretty()

bypassDocumentValidation:只有当你指定了$out操作符，使db.collection.aggregate绕过文档验证操作过程中。这让您插入不符合验证要求的文档。
查看全文

相关阅读:
PIE SDK 基于Dot net bar实现比例尺控件
 PIE SDK 监督分类对话框类（SupervisedClassificaitonDialog）使用经验
 图层树右键菜单结合Command操作过程
 PIE 插件式开发小笔记__PIESDK学习体会
 [转]sqlserver收缩文件没效果的解决办法
 efcore 输出显示sql语句
 Linux 常见的进程调度算法
 Linux 配置 vimrc
排序选择排序&&堆排序
 C/C++ 内存管理 (《高质量C++》-- 整理笔记)

原文地址：https://www.cnblogs.com/yanwei-wang/p/8628431.html

MongoDB 聚合Group(二)

转自：https://blog.csdn.net/congcong68/article/details/51620040

一、简介：

二、基础的操作