zoukankan html css js c++ java

Oozie wordcount实战

一、定义

基本概念

Action: An execution/computation task (Map-Reduce job, Pig job, a shell command). It can also be referred as task or 'action node'.
》》》》Action 也叫 Action Node 用于执行或者运算的任务(如MapRudecr,shell 命令等)

Workflow: A collection of actions arranged in a control dependency DAG (Direct Acyclic Graph)有向无环图. "control dependency" from one action to another means that the second action can't run until the first action has completed.
》》》》WorkFlow 依赖有向无环图控制 actions ，在这个任务结束之前，另一个任务不能运行

Workflow Definition: A programmatic description of a workflow that can be executed.
》》》》用于定义一个Wokrflow

Workflow Definition Language: The language used to define a Workflow Definition.

Workflow Job: An executable instance of a workflow definition.

Workflow Engine: A system that executes workflows jobs. It can also be referred as a DAG engine.

Workflow Definition

A workflow definition is a DAG with control flow nodes (start, end, decision, fork, join, kill) or action nodes (map-reduce, pig, etc.), nodes are connected by transitions arrows.
》》》》一个 Workflow 包括有flow控制节点[control flow nodes (start, end, decision, fork, join, kill)] 和 action nodes (map-reduce, pig, etc.)

The workflow definition language is XML based and it is called hPDL (Hadoop Process Definition Language).

二、如何编写一个 workflow.xml 之 Map-Reduce

1.The map-reduce action starts a Hadoop map/reduce job from a workflow. Hadoop jobs can be Java Map/Reduce jobs or streaming jobs. 》》》可以是一个 JAVA 的Map-ruduce程序，也可以是一个流式计算任务。

2.A map-reduce action can be configured to perform file system cleanup and directory creation before starting the map reduce job. This capability enables Oozie to retry a Hadoop job in the situation of a transient failure (Hadoop checks the non-existence of the job output directory and then creates it when the Hadoop job is starting, thus a retry without cleanup of the job output directory would fail).》》》Mapreduce 程序需要确保输出目录不存在

3.The workflow job will wait until the Hadoop map/reduce job completes before continuing to the next action in the workflow execution path.》》》在继续下一个任务之前确保这个任务已经结束了

4.The counters of the Hadoop job and job exit status (=FAILED=, KILLED or SUCCEEDED ) must be available to the workflow job after the Hadoop jobs ends. This information can be used from within decision nodes and other actions configurations.》》》必须提供一个自己的状态给别人参考，以进行别的任务安排

5.The map-reduce action has to be configured with all the necessary Hadoop JobConf properties to run the Hadoop map/reduce job.》》》这句户的意思是说，我们在编写mapreduce程序的时候只需要 Map 和 Reduce 其他配置信息在 xml 中说明

workfolw.xml(旧版本API，且缺少很多必要的配置参数，毕竟是demo)

<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
	<start to="mr-node"/>
	<action name="mr-node">
		<map-reduce>
			<job-tracker>${jobTracker}</job-tracker>
			<name-node>${nameNode}</name-node>
			<prepare>
				<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
			</prepare>
			<configuration>
				<property>
					<name>mapred.job.queue.name</name>
					<value>${queueName}</value>
				</property>
				<property>
					<name>mapred.mapper.class</name>
					<value>org.apache.oozie.example.SampleMapper</value>
				</property>
				<property>
					<name>mapred.reducer.class</name>
					<value>org.apache.oozie.example.SampleReducer</value>
				</property>
				<property>
					<name>mapred.map.tasks</name>
					<value>1</value>
				</property>
				<property>
					<name>mapred.input.dir</name>
					<value>/user/${wf:user()}/${examplesRoot}/input-data/text</value>
				</property>
				<property>
					<name>mapred.output.dir</name>
					<value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>
				</property>
			</configuration>
		</map-reduce>
		<ok to="end"/>
		<error to="fail"/>
	</action>
	<kill name="fail">
		<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
	</kill>
	<end name="end"/>
</workflow-app>

修改 job.properties 文件

nameNode=hdfs://cen-ubuntu.cenzhongman.com:8020
jobTracker=0.0.0.0:8032
queueName=default
oozieAppRoot=oozie-apps
oozieDataRoot=oozie/datas

oozie.wf.application.path=${nameNode}/user/${user.name}/${oozieAppRoot}/mr-wordcount-wf/workflow.xml
inputDir=mr-wordcount-wf/input
outputDir=mr-wordcount-wf/output

标准的 workflow.xml 文件
参考MapReduce 程序设计中的 driver

<workflow-app xmlns="uri:oozie:workflow:0.5" name="mr-wordcount-wf">
  <start to="mr-node-wordcount"/>
  <action name="mr-node-wordcount">
    <map-reduce>
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/cen/${oozieAppsRoot}/${outputDir}"/>
      </prepare>
      <configuration>
        <property>
          <name>mapred.mapper.new-api</name>
          <value>true</value>
        </property>
        <property>
          <name>mapred.reducer.new-api</name>
          <value>true</value>
        </property>
        <property>
          <name>mapreduce.job.queuename</name>
          <value>${queueName}</value>
        </property>
        <property>
          <name>mapreduce.job.map.class</name>
          <value>com.cenzhongman.hdfs.WordCount$WordcountMapper</value>
        </property>
        <property>
          <name>mapreduce.job.reduce.class</name>
          <value>com.cenzhongman.hdfs.WordCount$WordcountReducer</value>
        </property>
        <property>
          <name>mapreduce.map.output.key.class</name>
          <value>org.apache.hadoop.io.Text</value>
        </property>
        <property>
          <name>mapreduce.map.output.value.class</name>
          <value>org.apache.hadoop.io.IntWritable</value>
        </property>
        <property>
          <name>mapreduce.job.output.key.class</name>
          <value>org.apache.hadoop.io.Text</value>
        </property>
        <property>
          <name>mapreduce.job.output.value.class</name>
          <value>org.apache.hadoop.io.IntWritable</value>
        </property>
        <property>
          <name>mapreduce.input.fileinputformat.inputdir</name>
          <value>/user/cen/${oozieAppsRoot}/${inputDir}</value>
        </property>
        <property>
          <name>mapreduce.output.fileoutputformat.outputdir</name>
          <value>/user/cen/${oozieAppsRoot}/${outputDir}</value>
        </property>
      </configuration>
    </map-reduce>
    <ok to="end"/>
    <error to="fail"/>
  </action>
  <kill name="fail">
    <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <end name="end"/>
</workflow-app>

注意事项：

修改版本为0.5
修改程序名
修改 action 名(两处)
修改删除路径
修改 map reduce 新api
修改Mapclass(注意内部类的写法)
修改reduceclass(注意内部类的写法)
修改 map-output-key class value
修改 job-output-key class value
修改 input dir
修改 output dir

其他步骤

1.拷贝jar包到lib目录下

2.上传包文件夹到指定目录

3.上传数据文件

4.执行程序

export OOZIE_URL=http://cen-ubuntu:11000/oozie/
bin/oozie job -config /opt/cdh5.3.6/oozie-4.0.0-cdh5.3.6/oozie-apps/mr-wordcount-wf/job.properties -run

查看全文

相关阅读:
iOS刨根问底-深入理解RunLoop
深入理解RunLoop
Core Graphics框架利用Quartz 2D绘图
 经济
 次贷危机的原因
 次级抵押贷款
 信用评级
 信用
 理解UIView的绘制-孙亚洲
 二级域名

原文地址：https://www.cnblogs.com/cenzhongman/p/7245670.html