zoukankan      html  css  js  c++  java
  • Hadoop JobHistory

     

    hadoop jobhistory记录下已运行完的MapReduce作业信息并存放在指定的HDFS目录下,默认情况下是没有启动的,需要配置完后手工启动服务。

    mapred-site.xml添加如下配置

    复制代码
    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>hadoop000:10020</value>
      <description>MapReduce JobHistory Server IPC host:port</description>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>hadoop000:19888</value>
      <description>MapReduce JobHistory Server Web UI host:port</description>
    </property>
    
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/history/done</value>
    </property>
    
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/history/done_intermediate</value></property>
    复制代码

    启动history-server:

    $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

    停止history-server:

    $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver

    history-server启动之后,可以通过浏览器访问WEBUI: hadoop000:19888

    在hdfs上会生成两个目录

    hadoop fs -ls /history
    drwxrwx--- - spark supergroup 0 2014-10-11 15:11 /history/done drwxrwxrwt - spark supergroup 0 2014-10-11 15:16 /history/done_intermediate

    mapreduce.jobhistory.done-dir(/history/done): Directory where history files are managed by the MR JobHistory Server(已完成作业信息)
    mapreduce.jobhistory.intermediate-done-dir(/history/done_intermediate): Directory where history files are written by MapReduce jobs.(正在运行作业信息)

    测试:

    通过hive查询city表观察hdfs文件目录和hadoop000:19888

    hive> select id, name from city;

    观察hdfs文件目录:

    1)历史作业记录是按照年/月/日的形式分别存放在相应的目录(/history/done/2014/10/11/000000);

    2)每个作业有2个不同的后缀名的记录:jhist和xml

    hadoop fs -ls /history/done/2014/10/11/000000
    -rwxrwx--- 1 spark supergroup 22572 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002-1413012208648-spark-select+id%2C+name+from+city%28Stage%2D1%29-1413012224777-1-0-SUCCEEDED-root.spark-1413012216261.jhist -rwxrwx--- 1 spark supergroup 160149 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002_conf.xml

    观察WEBUI: hadoop000:19888

    在WEBUI中展现了每个job使用的Map/Reduce的数量、作业提交时间、作业启动时间、作业完成时间、Job ID、提交人User、队列等信息;

    点击【job_1413011730351_0002】弹出页面显示类似信息:Aggregation is not enabled. Try the nodemanager at ......

    解决方法: yarn-site.xml添加如下配置

    <property>  
        <name>yarn.log-aggregation-enable</name>  
        <value>true</value>  
    </property> 

    重启yarn即可。

    参考CDH文档:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

  • 相关阅读:
    hdu 5446 Unknown Treasure lucas和CRT
    Hdu 5444 Elven Postman dfs
    hdu 5443 The Water Problem 线段树
    hdu 5442 Favorite Donut 后缀数组
    hdu 5441 Travel 离线带权并查集
    hdu 5438 Ponds 拓扑排序
    hdu 5437 Alisha’s Party 优先队列
    HDU 5433 Xiao Ming climbing dp
    hdu 5432 Pyramid Split 二分
    Codeforces Round #319 (Div. 1) B. Invariance of Tree 构造
  • 原文地址:https://www.cnblogs.com/xc1234/p/9183715.html
Copyright © 2011-2022 走看看