zoukankan      html  css  js  c++  java
  • hadoop测试题目每天5题,总35题,第五天

    地址:http://www.cnblogs.com/jarlean/archive/2013/04/12/3015911.html                       

    Q21. What is the characteristic of streaming API that makes it flexible run map reduce jobs in languages like perl, ruby, awk etc.  (streaming的什么特性让他支持多语言的MR任务)
    Hadoop Streaming allows to use arbitrary programs for the Mapper and Reducer phases of a Map Reduce job by having both Mappers and Reducers receive their input on stdin and emit output (key, value) pairs on stdout.(MR以标准形式输入即可)
    Q22. Whats is Distributed Cache in Hadoop
    Distributed Cache is a facility provided by the Map/Reduce framework to cache files (text, archives, jars and so on) needed by applications during execution of the job. The framework will copy the necessary files to the slave node before any tasks for the job are executed on that node.(分布式缓存以广播形式将files拷贝到slave节点,减少了join操作的时间开销)
    Q23. What is the benifit of Distributed cache, why can we just have the file in HDFS and have the application read it 
    This is because distributed cache is much faster. It copies the file to all trackers at the start of the job. Now if the task tracker runs 10 or 100 mappers or reducer, it will use the same copy of distributed cache. On the other hand, if you put code in file to read it from HDFS in the MR job then every mapper will try to access it from HDFS hence if a task tracker run 100 map jobs then it will try to read this file 100 times from HDFS. Also HDFS is not very efficient when used like this.(分布式缓存在job运行前拷贝file到各个节点,提高了运行效率。但这也造成产生节点倍数进程的问题,故不很实用)
    Q.24 What mechanism does Hadoop framework provides to synchronize changes made in Distribution Cache during runtime of the application 
    This is a trick questions. There is no such mechanism. Distributed Cache by design is read only during the time of Job execution(分布式缓存只是设计用来读取的,没有办法保证任务同步)
    Q25. Have you ever used Counters in Hadoop. Give us an example scenario
    Anybody who claims to have worked on a Hadoop project is expected to use counters(呵呵,任何做过hadoop项目的人都该知道计数器)

  • 相关阅读:
    微信小程序——gulp处理文件
    小程序开发经验总结
    微信小程序入门之构建一个简单TODOS应用
    3元体验腾讯云小程序后端解决方案
    C++笔记:面向对象编程(Handle类)
    你真的知道你看到的UTF-8字符是什么吗?
    Unity3D游戏开发之在Unity3D中视频播放功能的实现
    vb.net机房收费系统——存储过程
    Oracle基础学习4--Oracle权限传递
    我与京东的那些事儿
  • 原文地址:https://www.cnblogs.com/jarlean/p/3015911.html
Copyright © 2011-2022 走看看