zoukankan      html  css  js  c++  java
  • MapReduce(3): Partitioner, Combiner and Shuffling

    Partitioner:

    Partitioning and Combining take place between Map and Reduce phases. It is to club the data which should go to the same reducer based on keys. The number of partitioners is equal to the number of reducers. That means a partitioner will divide the data according to the number of reducers. Therefore, the data passed from a single partitioner is processed by a single Reducer. HashPartitioner is the default Partitioner in hadoop.

    A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the data using a user-defined condition, which works like a hash function. The total number of partitions is same as the number of Reducer tasks for the job. Records having the same key value go into the same partition (within each mapper).

    Partition doing jobs on local machine.

    Combiner:

    Combiner is a 'mini-reducer' (semi-reducer), used to process reducer's work before transfering data onto reducers. It can reduce network congestion. An example is shown below:

    Shuffle:

    shuffle notify master to copy files onto reducer machines. In the final output of map task there can be multiple partitions and these partitions should go to different reduce task. Shuffling is basically transferring map output partitions to the corresponding reduce tasks. Map task notified application master about completion of map task and application master notifies corresponding reducer to copy the map output into reduce machine. As shuffling can start even before the map phase has finished so this saves some time and completes the tasks in lesser time.

    References:

    https://www.cnblogs.com/hadoop-dev/p/5910459.html

    https://blog.csdn.net/bitcarmanlee/article/details/60137837

    http://geekdirt.com/blog/map-reduce-in-detail/

    Using hash function to map immediate K,V pairs

    https://en.wikipedia.org/wiki/Hash_function

    https://www.tutorialspoint.com/map_reduce/map_reduce_partitioner.htm

    https://data-flair.training/blogs/hadoop-partitioner-tutorial/

  • 相关阅读:
    Reflector 已经out了,试试ILSpy
    visio studio删除空行
    SQL语句增加字段、修改字段、修改类型、修改默认值
    判断两个集合中 是否有相同的元素
    Rdlc 参数问题
    SQL Server 2008 报表服务入门【转】
    WebAPI异常捕捉处理,结合log4net日志(webapi2框架)
    HTTP Error 500.30
    前端Json 增加,删除,修改元素(包含json数组处理)
    IE浏览器F12调试模式不能使用或报错以及安装程序遇到错误0x80240037的解决办法
  • 原文地址:https://www.cnblogs.com/rhyswang/p/10946833.html
Copyright © 2011-2022 走看看