zoukankan      html  css  js  c++  java
  • MapReduce模式MapReduce patterns

    After having modified and run a job in the last post, we can now examine which are the most frequent patterns we encounter in MapReduce programming. 
    Although there are many of them, I think that the most important ones are:

    • Summarization
    • Filtering
    • Structural

    Let's examine them in detail. 

    Summarization 
    By summarization we mean all the jobs that perform numerical computation over a set of data, like:

    • indexing
    • mean (or other statistical functions) computation
    • min/max computation
    • count (we've seen the WordCount example)


    Filtering 
    Filtering is the act of retrieving only a subset of a bigger dataset. Most used cases are retrieving all data belonging to a single user or the top-N elements (by some criteria) of the dataset. Another frequent use of filtering is for sampling a dataset: when we're dealing with a lot of data , is usually a good idea to subset the original data by choosing some elements randomly to verify the behaviour of our job. 

    Structural 
    When you need to operate on the structure of the data; most used case is a join on different data, like the ones we're used to on a RDBMS. 

    In the next posts, we'll see in more detail how to deal with these patterns.

    from: http://andreaiacono.blogspot.com/2014/03/mapreduce-patterns.html

  • 相关阅读:
    美团面试(c++方向)
    浪潮面试-软开
    ofo C++面试
    B树、B+树等
    爱奇艺2017秋招笔试(C++智能设备方向)
    腾讯内推一面C++
    i++ 相比 ++i 哪个更高效?为什么?
    进程间的通讯(IPC)方式
    一台服务器能够支持多少TCP并发连接呢?
    可重入和不可重入
  • 原文地址:https://www.cnblogs.com/GarfieldEr007/p/5281211.html
Copyright © 2011-2022 走看看