hadoop测试题目每天5题，总35题，第一天

zoukankan html css js c++ java

hadoop测试题目每天5题，总35题，第一天

以下题目注释为自己添加，如果有不正确的，希望有大牛指正，谢谢

地址：http://www.cnblogs.com/jarlean/archive/2013/04/08/3008308.html

Q1. Name the most common InputFormats defined in Hadoop? Which one is default ? (Text是默认的格式) Following 2 are most common InputFormats defined in Hadoop - TextInputFormat - KeyValueInputFormat - SequenceFileInputFormat

Q2. What is the difference between TextInputFormatand KeyValueInputFormat class TextInputFormat: It reads lines of text files and provides the offset of the line as key to the Mapper and actual line as Value to the mapper(text将偏移值作为key，真实值为value) KeyValueInputFormat: Reads text file and parses lines into key, val pairs. Everything up to the first tab character is sent as key to the Mapper and the remainder of the line is sent as value to the mapper.（这种格式的数据为key value组合值，中间用tab分隔） Q3. What is InputSplit in Hadoop When a hadoop job is run, it splits input files into chunks and assign each split to a mapper to process. This is called Input Split（输入分片，提供给map进程的块信息） Q4. How is the splitting of file invoked in Hadoop Framework It is invoked by the Hadoop framework by running getInputSplit()method of the Input format class (like FileInputFormat) defined by the user（通过用户定义的类，运行getInputSplit方法，完成Hadoop的分片操作） Q5. Consider case scenario: In M/R system, - HDFS block size is 64 MB - Input format is FileInputFormat - We have 3 files of size 64K, 65Mb and 127Mb then how many input splits will be made by Hadoop framework? Hadoop will make 5 splits as follows（不足块大小的，hadoop不占额外空间，超过块大小的，hadoop先填满一个块，然后将剩余的数据写入下一个空块中） - 1 split for 64K files - 2 splits for 65Mb files - 2 splits for 127Mb file

查看全文

相关阅读:
opencv图片右转函数
 多项式相加实验代码和报告
 C++下实现同接口下多个类作为参数的调用和传参
 Betsy Ross Problem
matlab绘制实用日历实例代码
 node-sass 安装卡在 node scripts/install.js 解决办法
 如何管理自己？
Webstorm 11 注册/破解方法
 解决play-1.4.0在linux或mac下提示No such file or directory的问题
 PlayFramework 1.2.x 在Controller 中识别JSON提交

原文地址：https://www.cnblogs.com/jarlean/p/3008308.html