zoukankan      html  css  js  c++  java
  • Hadoop JobHistory

     

    hadoop jobhistory记录下已运行完的MapReduce作业信息并存放在指定的HDFS目录下,默认情况下是没有启动的,需要配置完后手工启动服务。

    mapred-site.xml添加如下配置

    复制代码
    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>hadoop000:10020</value>
      <description>MapReduce JobHistory Server IPC host:port</description>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>hadoop000:19888</value>
      <description>MapReduce JobHistory Server Web UI host:port</description>
    </property>
    
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/history/done</value>
    </property>
    
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/history/done_intermediate</value></property>
    复制代码

    启动history-server:

    $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

    停止history-server:

    $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver

    history-server启动之后,可以通过浏览器访问WEBUI: hadoop000:19888

    在hdfs上会生成两个目录

    hadoop fs -ls /history
    drwxrwx--- - spark supergroup 0 2014-10-11 15:11 /history/done drwxrwxrwt - spark supergroup 0 2014-10-11 15:16 /history/done_intermediate

    mapreduce.jobhistory.done-dir(/history/done): Directory where history files are managed by the MR JobHistory Server(已完成作业信息)
    mapreduce.jobhistory.intermediate-done-dir(/history/done_intermediate): Directory where history files are written by MapReduce jobs.(正在运行作业信息)

    测试:

    通过hive查询city表观察hdfs文件目录和hadoop000:19888

    hive> select id, name from city;

    观察hdfs文件目录:

    1)历史作业记录是按照年/月/日的形式分别存放在相应的目录(/history/done/2014/10/11/000000);

    2)每个作业有2个不同的后缀名的记录:jhist和xml

    hadoop fs -ls /history/done/2014/10/11/000000
    -rwxrwx--- 1 spark supergroup 22572 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002-1413012208648-spark-select+id%2C+name+from+city%28Stage%2D1%29-1413012224777-1-0-SUCCEEDED-root.spark-1413012216261.jhist -rwxrwx--- 1 spark supergroup 160149 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002_conf.xml

    观察WEBUI: hadoop000:19888

    在WEBUI中展现了每个job使用的Map/Reduce的数量、作业提交时间、作业启动时间、作业完成时间、Job ID、提交人User、队列等信息;

    点击【job_1413011730351_0002】弹出页面显示类似信息:Aggregation is not enabled. Try the nodemanager at ......

    解决方法: yarn-site.xml添加如下配置

    <property>  
        <name>yarn.log-aggregation-enable</name>  
        <value>true</value>  
    </property> 

    重启yarn即可。

    参考CDH文档:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

  • 相关阅读:
    【BZOJ 4581】【Usaco2016 Open】Field Reduction
    【BZOJ 4582】【Usaco2016 Open】Diamond Collector
    【BZOJ 4580】【Usaco2016 Open】248
    【BZOJ 3754】Tree之最小方差树
    【51Nod 1501】【算法马拉松 19D】石头剪刀布威力加强版
    【51Nod 1622】【算法马拉松 19C】集合对
    【51Nod 1616】【算法马拉松 19B】最小集合
    【51Nod 1674】【算法马拉松 19A】区间的价值 V2
    【BZOJ 2541】【Vijos 1366】【CTSC 2000】冰原探险
    【BZOJ 1065】【Vijos 1826】【NOI 2008】奥运物流
  • 原文地址:https://www.cnblogs.com/xc1234/p/9183715.html
Copyright © 2011-2022 走看看