zoukankan      html  css  js  c++  java
  • Hadoop JobHistory

    hadoop jobhistory记录下已运行完的MapReduce作业信息并存放在指定的HDFS目录下,默认情况下是没有启动的,需要配置完后手工启动服务。

    mapred-site.xml添加如下配置

    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>hadoop000:10020</value>
      <description>MapReduce JobHistory Server IPC host:port</description>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>hadoop000:19888</value>
      <description>MapReduce JobHistory Server Web UI host:port</description>
    </property>
    
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/history/done</value>
    </property>
    
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/history/done_intermediate</value></property>

    启动history-server:

    $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

    停止history-server:

    $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver

    history-server启动之后,可以通过浏览器访问WEBUI: hadoop000:19888

    在hdfs上会生成两个目录

    hadoop fs -ls /history
    drwxrwx
    --- - spark supergroup 0 2014-10-11 15:11 /history/done drwxrwxrwt - spark supergroup 0 2014-10-11 15:16 /history/done_intermediate

    mapreduce.jobhistory.done-dir(/history/done): Directory where history files are managed by the MR JobHistory Server(已完成作业信息)
    mapreduce.jobhistory.intermediate-done-dir(/history/done_intermediate): Directory where history files are written by MapReduce jobs.(正在运行作业信息)

    测试:

    通过hive查询city表观察hdfs文件目录和hadoop000:19888

    hive> select id, name from city;

    观察hdfs文件目录:

    1)历史作业记录是按照年/月/日的形式分别存放在相应的目录(/history/done/2014/10/11/000000);

    2)每个作业有2个不同的后缀名的记录:jhist和xml

    hadoop fs -ls /history/done/2014/10/11/000000
    -rwxrwx--- 1 spark supergroup 22572 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002-1413012208648-spark-select+id%2C+name+from+city%28Stage%2D1%29-1413012224777-1-0-SUCCEEDED-root.spark-1413012216261.jhist -rwxrwx--- 1 spark supergroup 160149 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002_conf.xml

    观察WEBUI: hadoop000:19888

    在WEBUI中展现了每个job使用的Map/Reduce的数量、作业提交时间、作业启动时间、作业完成时间、Job ID、提交人User、队列等信息;

    点击【job_1413011730351_0002】弹出页面显示类似信息:Aggregation is not enabled. Try the nodemanager at ......

    解决方法: yarn-site.xml添加如下配置

    <property>  
        <name>yarn.log-aggregation-enable</name>  
        <value>true</value>  
    </property> 

    重启yarn即可。

    参考CDH文档:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

  • 相关阅读:
    CADisplayLink
    对项目重命名
    TCP|UDP|Http|Socket
    CoreAnimation|动画
    Autolayout
    通讯录
    本地通知
    用于做 Android 屏幕自适应的文章资源
    Java String.format 自动补全不够的位数
    不同语言之间 日期格式转换
  • 原文地址:https://www.cnblogs.com/luogankun/p/4019303.html
Copyright © 2011-2022 走看看