zoukankan      html  css  js  c++  java
  • MongoDB Map Reduce

    MongoDB - Map Reduce


    Advertisements

     Previous Page

    Next Page  

    As per the MongoDB documentation, Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. MongoDB uses mapReduce command for map-reduce operations. MapReduce is generally used for processing large data sets.

    MapReduce Command

    Following is the syntax of the basic mapReduce command −

    >db.collection.mapReduce(
       function() {emit(key,value);},  //map function
       function(key,values) {return reduceFunction}, {   //reduce function
          out: collection,
          query: document,
          sort: document,
          limit: number
       }
    )

    The map-reduce function first queries the collection, then maps the result documents to emit key-value pairs, which is then reduced based on the keys that have multiple values.

    In the above syntax −

    • map is a javascript function that maps a value with a key and emits a key-value pair

    • reduce is a javascript function that reduces or groups all the documents having the same key

    • out specifies the location of the map-reduce query result

    • query specifies the optional selection criteria for selecting documents

    • sort specifies the optional sort criteria

    • limit specifies the optional maximum number of documents to be returned

    Using MapReduce

    Consider the following document structure storing user posts. The document stores user_name of the user and the status of post.

    {
       "post_text": "tutorialspoint is an awesome website for tutorials",
       "user_name": "mark",
       "status":"active"
    }

    Now, we will use a mapReduce function on our posts collection to select all the active posts, group them on the basis of user_name and then count the number of posts by each user using the following code −

    >db.posts.mapReduce( 
       function() { emit(this.user_id,1); }, 
    	
       function(key, values) {return Array.sum(values)}, {  
          query:{status:"active"},  
          out:"post_total" 
       }
    )

    The above mapReduce query outputs the following result −

    {
       "result" : "post_total",
       "timeMillis" : 9,
       "counts" : {
          "input" : 4,
          "emit" : 4,
          "reduce" : 2,
          "output" : 2
       },
       "ok" : 1,
    }
    

    The result shows that a total of 4 documents matched the query (status:"active"), the map function emitted 4 documents with key-value pairs and finally the reduce function grouped mapped documents having the same keys into 2.

    To see the result of this mapReduce query, use the find operator −

    >db.posts.mapReduce( 
       function() { emit(this.user_id,1); }, 
       function(key, values) {return Array.sum(values)}, {  
          query:{status:"active"},  
          out:"post_total" 
       }
    	
    ).find()

    The above query gives the following result which indicates that both users tom and mark have two posts in active states −

    { "_id" : "tom", "value" : 2 }
    { "_id" : "mark", "value" : 2 }

    In a similar manner, MapReduce queries can be used to construct large complex aggregation queries. The use of custom Javascript functions make use of MapReduce which is very flexible and powerful.

  • 相关阅读:
    HBase原理和架构
    Hive UDF作业
    Hive性能调优
    hive
    Netty4.0学习笔记系列之一:Server与Client的通讯
    JAVA NIO 简介(转)
    设计模式之观察者模式(Observer Pattern)
    设计模式之装饰者模式(Decorator Pattern)
    mysql存储过程写法—动态参数运用
    hashCode() 和equals() 区别和作用
  • 原文地址:https://www.cnblogs.com/grj001/p/12223570.html
Copyright © 2011-2022 走看看