地址: http://www.cnblogs.com/jarlean/archive/2013/04/11/3013583.html
Q16. Suppose Hadoop spawned 100 tasks for a job and one of the task failed. What willhadoop do ? It will restart the task again on some other task tracker and only if the task fails more than 4 (default setting and can be changed) times will it kill the job(从其它tasktracker重启任务,失败超过4次的任务将被移除) Q17. Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this Speculative Execution(如何解决一些运行时间长的任务拖延时间的问题)
通过执行推测式任务(speculative task)处理,重新运行任务,哪个任务先完成,使用哪个结果,其它的被删除。
参考: http://www.360doc.com/content/12/0622/17/10248211_219834470.shtmlhttp://www.360doc.com/content/12/0622/17/10248211_219834470.shtml
Q18. How does speculative execution works in Hadoop (Hadoop的推测式任务如何运行) Job tracker makes different task trackers process same input. When tasks complete, they announce this fact to the Job Tracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the Task Trackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first.(reducers从最先完成的节点接收数据) Q19. Using command line in Linux, how will you - see all jobs running in the hadoop cluster(命令: hadoop job -list) - kill a job(命令: hadoop job -kill jobid)
Q20. What is Hadoop Streaming Streaming is a generic API that allows programs written in virtually any language to be used as Hadoop Mapper and Reducer implementations(让hadoop的进程能以任意的语言实现MR操作)
利用streaming完成任务语言实现MR操作。