zoukankan      html  css  js  c++  java
  • Partitioner— redirecting output from Mapper如何来分割引导来自mapper的输出

    Partitioner— redirecting output from Mapper

    A common misconception for first-time MapReduce programmers is to use only a
    single reducer

    大部分的一种错误的概念是程序只用一个单一的reducer

    After all, a single reducer sorts all of your data before processing—
    and who doesn’t like sorted data? Our discussions regarding MapReduce expose
    the folly of such thinking. We would have ignored the benefits of parallel computation
    . With one reducer, our compute cloud has been demoted to a compute
    raindrop.

    毕竟,一个单一的reducer在处理之前排序你所有的数据-并且谁不喜欢对数据排序?我们的讨论是暴露这些愚笨的想法.

    我们将忽略用one reducer有利于并行计算,我们的云计已经降级到一个雨滴。

    (key/value) pair outputted by a mapper. The default behavior is to hash the key to
    determine the reducer. Hadoop enforces this strategy by use of the HashPartitioner
    class . Sometimes the HashPartitioner will steer you awry.(一些时刻hashpartitioner 将会扭曲的驾驶) Let’s return to the Edge
    class introduced in section 3.2.1.
    Suppose you used the Edge class to analyze flight information data to determine the
    number of passengers departing from each airport. Such data may be
    假如你想用edge class 去分析统计从一个时刻起,每一个机场的乘客数目。假如这个数据是

     (San Francisco, Los Angeles) Chuck Lam
    (San Francisco, Dallas) James Warren

    If you used HashPartitioner, the two rows could be sent to different reducers. The
    number of departures would be processed twice and both times erroneously.

    假如你用这个hashpartioner你就大错特错了,这两行数据将会被发生到不同的reucers.

    How do we customize the partitioner for your applications? In this situation, we
    want all edges with a common departure point to be sent to the same reducer. This is
    done easily enough by hashing the departureNode member of the Edge :

    该怎么解决呢?

    public class EdgePartitioner implements Partitioner<Edge, Writable>
    {
    @Override
    public int getPartition(Edge key, Writable value, int numPartitions)
    {
    return key.getDepartureNode().hashCode() % numPartitions;
    }
    @Override
    public void configure(JobConf conf) { }
    }

    custom partitioner only needs to implement two functions: configure() and
    getPartition() . The former uses the Hadoop job configuration to configure the
    partitioner, and the latter returns an integer between 0 and the number of reduce tasks
    indexing to which reducer the (key/value) pair will be sent.
    The exact mechanics of the partitioner may be difficult to follow. Figure 3.2 illustrates
    this for better understanding.
    Between the map and reduce stages, a MapReduce application must take the output
    from the mapper tasks and distribute the results among the reducer tasks. This process
    is typically called shuffling , because the output of a mapper on a single node may be
    sent to reducers across multiple nodes in the cluster.

  • 相关阅读:
    [C]static变量详解
    [LINUX]重定向
    [PHP]一些坑
    [PHP]常量的一些特性
    [数学]三角函数(一)
    [PHP]session的一些要点
    [C]控制外部变量访问权限的extern和static关键字
    c语言基础----共用体
    c语言基础----字符串数组
    c语言基础----函数库
  • 原文地址:https://www.cnblogs.com/chenli0513/p/2290870.html
Copyright © 2011-2022 走看看