zoukankan      html  css  js  c++  java
  • Partitioning, Shuffle and sort

    Partitioning, Shuffle and sort  what happened?

    - Partitioning
    Partitioning is the process of determining which reducer instance will receive which intermediate keys and values. Each mapper must determine for all of its output (key, value) pairs which reducer will receive them. It is necessary that for any key, regardless of which mapper instance generated it, the destination partition is the same
    Problem: how dose the hadoop make it? Use a hash function ? what is the function? 
    here is code~
    1 public class HashPartitioner<K, V> extends Partitioner<K, V> {
    2     public int getPartition(K key, V value, int numReduceTasks) {
    3         return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
    4     }
    5 }

    解释:将key均匀分布在ReduceTasks上,举例如果Key为Text的话,Text的hashcode方法跟String的基本一致,都是采用的Horner公式计算,得到一个int,string太大的话这个int值可能会溢出变成负数,所以与上Integer.MAX_VALUE(即0111111111111111),然后再对reduce个数取余,这样就可以让相同key分布在一个节点上,并且较为均匀的分布在reduce上

    Horner规则:算法导论上有介绍这个,百度之

    think about BloomFilter~ 保证这个任务任务分发的均匀是关键,所以要设计优秀的hash函数是关键

    - Shuffle
    After the first map tasks have completed, the nodes may still be performing several more map tasks each. But they also begin exchanging the intermediate outputs from the map tasks to where they are required by the reducers. This process of moving map outputs to the reducers is known as shuffling.
     
    - Sort
    Each reduce task is responsible for reducing the values associated with several intermediate keys. The set of intermediate keys on a single node is automatically sorted by Hadoop before they are presented to the Reducer 
  • 相关阅读:
    OpenShift提供的免费.net空间 数据库 申请流程图文
    javascript实现全选全取消功能
    编写一个方法来获取页面url对应key的值
    面试题目产生一个int数组,长度为100,并向其中随机插入1-100,并且不能重复。
    面试宝典
    HDU 4619 Warm up 2 贪心或者二分图匹配
    HDU 4669 Mutiples on a circle 数位DP
    HDU 4666 最远曼哈顿距离
    HDU 4035 Maze 概率DP 搜索
    HDU 4089 Activation 概率DP
  • 原文地址:https://www.cnblogs.com/cherri/p/3286294.html
Copyright © 2011-2022 走看看