zoukankan      html  css  js  c++  java
  • hadoop测试题目每天5题,总35题,第二天

    地址:http://www.cnblogs.com/jarlean/archive/2013/04/09/3009855.html                       

    Q6. What is the purpose of RecordReader in Hadoop
    The InputSplithas defined a slice of work, but does not describe how to access it. The RecordReaderclass actually loads the data from its source and converts it into (key, value) pairs suitable for reading by the Mapper. The RecordReader instance is defined by the InputFormat(RecordReader将输入文件转成key value形式)
    Q7. After the Map phase finishes, the hadoop framework does "Partitioning, Shuffle and sort". Explain what happens in this phase?
    - Partitioning(决定哪个reduce进程接收哪个map输出的kv)
    Partitioning is the process of determining which reducer instance will receive which intermediate keys and values. Each mapper must determine for all of its output (key, value) pairs which reducer will receive them. It is necessary that for any key, regardless of which mapper instance generated it, the destination partition is the same
    - Shuffle(混洗,将key value整合,形成如(1,(a,b,c))形式的数据?)
    - Sort(节点在将数据进行reduce操作前,将先进行一次排序)
    Each reduce task is responsible for reducing the values associated with several intermediate keys. The set of intermediate keys on a single node is automatically sorted by Hadoop before they are presented to the Reducer
    Q9. If no custom partitioner is defined in the hadoop then how is data partitioned before its sent to the reducer 
    The default partitioner computes a hash value for the key and assigns the partition based on this result(系统带默认的分区器,由hash算法得出key,并根据key得出分区信息)
    Q10. What is a Combiner(合并函数,将map生成的数据写入Combiner,Combiner再将数据传给reduce)
    The Combiner is a "mini-reduce" process which operates only on data generated by a mapper. The Combiner will receive as input all data emitted by the Mapper instances on a given node. The output from the Combiner is then sent to the Reducers, instead of the output from the Mappers.

  • 相关阅读:
    安卓中像素px和dp的转换
    Android 使用Vector XML文件创建矢量图片资源,editText监听
    动态设置RecyclerView的高度
    EditText一些用法
    各种加密算法比较
    多线程--Task,等待用户输入AutoResetEvent
    AutoCAD二次开发——AutoCAD.NET API开发环境搭建
    Office(Excel、Word)二次开发——VSTO
    个人信息管理PIM——密码管理工具软件
    【矩阵计算】矩阵乘法其一:基础符号和算法
  • 原文地址:https://www.cnblogs.com/jarlean/p/3009855.html
Copyright © 2011-2022 走看看