zoukankan      html  css  js  c++  java
  • 【Hadoop离线基础总结】oozie调度MapReduce任务


    • 1.准备MR执行的数据

      MR的程序可以是自己写的,也可以是hadoop工程自带的。这里选用hadoop工程自带的MR程序来运行wordcount的示例
      准备以下数据上传到HDFS的/oozie/input路径下去

      hdfs dfs -mkdir -p /oozie/input
      vim wordcount.txt
      
      hello   world   hadoop
      spark   hive    hadoop
      

      hdfs dfs -put wordcount.txt /oozie/input 将数据上传到hdfs对应目录

    • 2.执行官方测试案例

      yarn jar /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar wordcount /oozie/input/ /oozie/output

    • 3.准备我们调度的资源

      将需要调度的资源都准备好放到一个文件夹下面去,包括jar包、ob.properties以及workflow.xml
      拷贝MR的任务模板

      cd /export/servers/oozie-4.1.0-cdh5.14.0
      cp -ra examples/apps/map-reduce/ oozie_works/
      

      删掉MR任务模板lib目录下自带的jar包

      cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib
      rm -rf oozie-examples-4.1.0-cdh5.14.0.jar
      

      拷贝jar包到对应目录
      从上一步的删除当中,可以看到需要调度的jar包存放在了 /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib 目录下,所以把需要调度的jar包也放到这个路径下即可
      cp /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib/

    • 4.修改配置文件

      修改job.properties

      cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce
      vim job.properties
      
      nameNode=hdfs://node01:8020
      jobTracker=node01:8032
      queueName=default
      examplesRoot=oozie_works
      
      oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml
      outputDir=/oozie/output
      inputdir=/oozie/input
      

      修改workflow.xml

      cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce
      vim workflow.xml
      
      <?xml version="1.0" encoding="UTF-8"?>
      <!--
        Licensed to the Apache Software Foundation (ASF) under one
        or more contributor license agreements.  See the NOTICE file
        distributed with this work for additional information
        regarding copyright ownership.  The ASF licenses this file
        to you under the Apache License, Version 2.0 (the
        "License"); you may not use this file except in compliance
        with the License.  You may obtain a copy of the License at
        
             http://www.apache.org/licenses/LICENSE-2.0
        
        Unless required by applicable law or agreed to in writing, software
        distributed under the License is distributed on an "AS IS" BASIS,
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        See the License for the specific language governing permissions and
        limitations under the License.
      -->
      <workflow-app xmlns="uri:oozie:workflow:0.5" name="map-reduce-wf">
          <start to="mr-node"/>
          <action name="mr-node">
              <map-reduce>
                  <job-tracker>${jobTracker}</job-tracker>
                  <name-node>${nameNode}</name-node>
                  <prepare>
                      <delete path="${nameNode}/${outputDir}"/>
                  </prepare>
                  <configuration>
                      <property>
                          <name>mapred.job.queue.name</name>
                          <value>${queueName}</value>
                      </property>
                      <!--把这些原有的配置注释掉-->
      				<!--  
                      <property>
                          <name>mapred.mapper.class</name>
                          <value>org.apache.oozie.example.SampleMapper</value>
                      </property>
                      <property>
                          <name>mapred.reducer.class</name>
                          <value>org.apache.oozie.example.SampleReducer</value>
                      </property>
                      <property>
                          <name>mapred.map.tasks</name>
                          <value>1</value>
                      </property>
                      <property>
                          <name>mapred.input.dir</name>
                          <value>/user/${wf:user()}/${examplesRoot}/input-data/text</value>
                      </property>
                      <property>
                          <name>mapred.output.dir</name>
                          <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>
                      </property>
      				-->
      				
      				   <!-- 开启使用新的API来进行配置 -->
                      <property>
                          <name>mapred.mapper.new-api</name>
                          <value>true</value>
                      </property>
      
                      <property>
                          <name>mapred.reducer.new-api</name>
                          <value>true</value>
                      </property>
      
                      <!-- 指定MR的输出key的类型 -->
                      <property>
                          <name>mapreduce.job.output.key.class</name>
                          <value>org.apache.hadoop.io.Text</value>
                      </property>
      
                      <!-- 指定MR的输出的value的类型-->
                      <property>
                          <name>mapreduce.job.output.value.class</name>
                          <value>org.apache.hadoop.io.IntWritable</value>
                      </property>
      
                      <!-- 指定输入路径 -->
                      <property>
                          <name>mapred.input.dir</name>
                          <value>${nameNode}/${inputdir}</value>
                      </property>
      
                      <!-- 指定输出路径 -->
                      <property>
                          <name>mapred.output.dir</name>
                          <value>${nameNode}/${outputDir}</value>
                      </property>
      
                      <!-- 指定执行的map类 -->
                      <property>
                          <name>mapreduce.job.map.class</name>
                          <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>
                      </property>
      
                      <!-- 指定执行的reduce类 -->
                      <property>
                          <name>mapreduce.job.reduce.class</name>
                          <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>
                      </property>
      				<!--  配置map task的个数 -->
                      <property>
                          <name>mapred.map.tasks</name>
                          <value>1</value>
                      </property>
      
                  </configuration>
              </map-reduce>
              <ok to="end"/>
              <error to="fail"/>
          </action>
          <kill name="fail">
              <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
          </kill>
          <end name="end"/>
      </workflow-app>
      
    • 5.上传调度任务到hdfs对应目录
      cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works
      hdfs dfs -put map-reduce/ /user/root/oozie_works/
      
    • 6.执行调度任务

      执行调度任务,然后通过oozie的11000端口进行查看任务结果

      cd /export/servers/oozie-4.1.0-cdh5.14.0
      bin/oozie job -oozie http://node03:11000/oozie -config oozie_works/map-reduce/job.properties -run
      
  • 相关阅读:
    四种常见的 POST 提交数据方式
    HTTP 协议中的 Transfer-Encoding
    一些安全相关的HTTP响应头
    密钥交换(密钥协商)算法及其原理
    SSL/TLS协议详解(下)——TLS握手协议
    SSL/TLS协议详解(中)——证书颁发机构
    SSL/TLS协议详解(上):密码套件,哈希,加密,密钥交换算法
    Maven的-pl -am -amd参数
    关于Sidecar Pattern
    Java Reference核心原理分析
  • 原文地址:https://www.cnblogs.com/zzzsw0412/p/12772457.html
Copyright © 2011-2022 走看看