zoukankan      html  css  js  c++  java
  • hadoop测试题目每天5题,总35题,第五天

    地址:http://www.cnblogs.com/jarlean/archive/2013/04/12/3015911.html                       

    Q21. What is the characteristic of streaming API that makes it flexible run map reduce jobs in languages like perl, ruby, awk etc.  (streaming的什么特性让他支持多语言的MR任务)
    Hadoop Streaming allows to use arbitrary programs for the Mapper and Reducer phases of a Map Reduce job by having both Mappers and Reducers receive their input on stdin and emit output (key, value) pairs on stdout.(MR以标准形式输入即可)
    Q22. Whats is Distributed Cache in Hadoop
    Distributed Cache is a facility provided by the Map/Reduce framework to cache files (text, archives, jars and so on) needed by applications during execution of the job. The framework will copy the necessary files to the slave node before any tasks for the job are executed on that node.(分布式缓存以广播形式将files拷贝到slave节点,减少了join操作的时间开销)
    Q23. What is the benifit of Distributed cache, why can we just have the file in HDFS and have the application read it 
    This is because distributed cache is much faster. It copies the file to all trackers at the start of the job. Now if the task tracker runs 10 or 100 mappers or reducer, it will use the same copy of distributed cache. On the other hand, if you put code in file to read it from HDFS in the MR job then every mapper will try to access it from HDFS hence if a task tracker run 100 map jobs then it will try to read this file 100 times from HDFS. Also HDFS is not very efficient when used like this.(分布式缓存在job运行前拷贝file到各个节点,提高了运行效率。但这也造成产生节点倍数进程的问题,故不很实用)
    Q.24 What mechanism does Hadoop framework provides to synchronize changes made in Distribution Cache during runtime of the application 
    This is a trick questions. There is no such mechanism. Distributed Cache by design is read only during the time of Job execution(分布式缓存只是设计用来读取的,没有办法保证任务同步)
    Q25. Have you ever used Counters in Hadoop. Give us an example scenario
    Anybody who claims to have worked on a Hadoop project is expected to use counters(呵呵,任何做过hadoop项目的人都该知道计数器)

  • 相关阅读:
    java核心学习(二十七) 多线程---线程相关类
    java核心学习(二十六) 多线程---线程池
    java核心学习(二十五) 多线程---线程组和未处理的异常
    java核心学习(二十四) 多线程---线程通信
    java核心学习(二十三) 多线程---线程同步
    java核心学习(二十二) 多线程---线程控制
    模线性方程 poj2115
    求两个圆的重合面积+二分 hdu3264
    求多边形面积 HDU2036
    判断两直线是否相交 hdu1086
  • 原文地址:https://www.cnblogs.com/jarlean/p/3015911.html
Copyright © 2011-2022 走看看