zoukankan      html  css  js  c++  java
  • oozie4.0.0安装

    安装完hadoop后便可安装oozie运行自己的工作流:

    1、下载oozie压缩包,oozie-4.0.0-cdh5.0.0.tar.gz,下载地址http://archive.cloudera.com/cdh5/cdh/5/

    2、下载ext-2.2.zip:http://extjs.com/deploy/ext-2.2.zip

    3、下载tomcat并解压

    4、下载maven,(下载的oozie是已经编译好的,如果是未编译的需要用maven先编译一下才能安装)

    5、解压oozie到安装目录,解压并设置环境变量如下:

    export MAVEN_HOME=/export/servers/apache-maven-3.0.5

    export TOMCAT_HOME=/export/servers/apache-tomcat-6.0.26

    export OOZIE_HOME=/export/servers/oozie-4.0.0-cdh5.0.0
    export OOZIE_CONFIG=/export/servers/oozie-4.0.0-cdh5.0.0/conf
    export PATH=$JAVA_HOME/bin:JRE_HOME/bin:$HADOOP_HOME/bin:$MAVEN_HOME/bin:$OOZIE_HOME/bin:$TOMCAT_HOME/bin:$PATH

    使环境变量生效:source /etc/profile

    6、修改oozie配置文件,conf目录结构如下:

    action-conf目录下只有一个hive.xml文件,修改内容如下:

    <configuration>
    <!-- An example of setting default properties for Hive action.
    This could be useful with Hadoop versions that have deprecated
    HADOOP_HOME that Hive still relies on.

    <property>
    <name>hadoop.bin.path</name>
    <value>/export/servers/hadoop-2.2.0/bin/hadoop</value>
    </property>

    <property>
    <name>hadoop.config.dir</name>
    <value>/export/servers/hadoop-2.2.0/etc/hadoop</value>
    </property>
    -->
    </configuration>

    hadoop-conf/core-site.xml:

    <configuration>

    <property>
    <name>mapreduce.jobtracker.kerberos.principal</name>
    <value>mapred/_HOST@LOCALREALM</value>
    </property>

    <property>
    <name>yarn.resourcemanager.principal</name>
    <value>yarn/_HOST@LOCALREALM</value>
    </property>

    <property>
    <name>dfs.namenode.kerberos.principal</name>
    <value>hdfs/_HOST@LOCALREALM</value>
    </property>

    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>

    </configuration>

    hadoop-config.xml文件内容与hadoop-conf/core-site.xml文件内容一样,无需修改。

    oozie-default.xml该文件改动主要有两处:

    1、<property>
    <name>oozie.services</name>
    <value>
    org.apache.oozie.service.SchedulerService,
    org.apache.oozie.service.InstrumentationService,
    org.apache.oozie.service.CallableQueueService,
    org.apache.oozie.service.UUIDService,
    org.apache.oozie.service.ELService,
    org.apache.oozie.service.AuthorizationService,
    org.apache.oozie.service.UserGroupInformationService,
    org.apache.oozie.service.HadoopAccessorService,
    org.apache.oozie.service.URIHandlerService,
    org.apache.oozie.service.MemoryLocksService,
    org.apache.oozie.service.DagXLogInfoService,
    org.apache.oozie.service.SchemaService,
    org.apache.oozie.service.LiteWorkflowAppService,
    org.apache.oozie.service.JPAService,
    org.apache.oozie.service.StoreService,
    org.apache.oozie.service.CoordinatorStoreService,
    org.apache.oozie.service.SLAStoreService,
    org.apache.oozie.service.DBLiteWorkflowStoreService,
    org.apache.oozie.service.CallbackService,
    org.apache.oozie.service.ActionService,
    org.apache.oozie.service.ShareLibService,
    org.apache.oozie.service.ActionCheckerService,
    org.apache.oozie.service.RecoveryService,
    org.apache.oozie.service.PurgeService,
    org.apache.oozie.service.CoordinatorEngineService,
    org.apache.oozie.service.BundleEngineService,
    org.apache.oozie.service.DagEngineService,
    org.apache.oozie.service.CoordMaterializeTriggerService,
    org.apache.oozie.service.StatusTransitService,
    org.apache.oozie.service.PauseTransitService,
    org.apache.oozie.service.GroupsService,
    org.apache.oozie.service.ProxyUserService,
    org.apache.oozie.service.XLogStreamingService,
    org.apache.oozie.service.JobsConcurrencyService
    </value>
    <description>
    All services to be created and managed by Oozie Services singleton.
    Class names must be separated by commas.
    </description>
    </property>

    将该节点的org.apache.oozie.service.JobsConcurrencyService类提至第一行,如下:

    <property>
    <name>oozie.services</name>
    <value>
    org.apache.oozie.service.JobsConcurrencyService,
    org.apache.oozie.service.SchedulerService,

    。。。。

    2、去掉下面节点,(其实去不去无所谓,根据自己的实际应用来)

    <property>
    <name>oozie.service.coord.check.maximum.frequency</name>
    <value>true</value>
    <description>
    When true, Oozie will reject any coordinators with a frequency faster than 5 minutes. It is not recommended to disable
    this check or submit coordinators with frequencies faster than 5 minutes: doing so can cause unintended behavior and
    additional system stress.
    </description>
    </property>

    oozie-site.xml,修改的地方主要有以下几点:

    1、<property>
    <name>oozie.service.ActionService.executor.ext.classes</name>
    <value>
    org.apache.oozie.action.email.EmailActionExecutor,
    org.apache.oozie.action.hadoop.HiveActionExecutor,
    org.apache.oozie.action.hadoop.ShellActionExecutor,
    org.apache.oozie.action.hadoop.SqoopActionExecutor,
    org.apache.oozie.action.hadoop.DistcpActionExecutor
    </value>
    </property>该节点修改成如下,添加几项内容:

    <property>
    <name>oozie.subworkflow.classpath.inheritance</name>
    <value>true</value>
    </property>
    <property>
    <name>oozie.servlet.CallbackServlet.max.data.len</name>
    <value>1048576</value>
    </property>

    <property>
    <name>oozie.service.ActionService.executor.ext.classes</name>
    <value>
    org.apache.oozie.action.email.EmailActionExecutor,
    org.apache.oozie.action.hadoop.HiveActionExecutor,
    org.apache.oozie.action.hadoop.ShellActionExecutor,
    org.apache.oozie.action.hadoop.SqoopActionExecutor,
    org.apache.oozie.action.hadoop.DistcpActionExecutor
    </value>
    </property>

    2、

    <property>
    <name>oozie.service.JPAService.jdbc.driver</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>
    JDBC driver class.
    </description>
    </property>

    <property>
    <name>oozie.service.JPAService.jdbc.url</name>
    <value>jdbc:mysql://192.168.157.92:3358/oozie4</value>
    <description>
    JDBC URL.
    </description>
    </property>

    <property>
    <name>oozie.service.JPAService.jdbc.username</name>
    <value>root</value>
    <description>
    DB user name.
    </description>
    </property>

    <property>
    <name>oozie.service.JPAService.jdbc.password</name>
    <value>123456</value>
    <description>
    DB user password.

    IMPORTANT: if password is emtpy leave a 1 space string, the service trims the value,
    if empty Configuration assumes it is NULL.
    </description>
    </property>

    这几个节点的作用是:oozie有一个默认的derby数据库,是用来存储oozie节点的相关信息的,如果想用自己的mysql数据库,可按照上面例子配置

    3、

    <property>
    <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
    <value>*=/export/servers/hadoop-2.2.0/etc/hadoop</value>
    <description>
    Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of
    the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
    used when there is no exact match for an authority. The HADOOP_CONF_DIR contains
    the relevant Hadoop *-site.xml files. If the path is relative is looked within
    the Oozie configuration directory; though the path can be absolute (i.e. to point
    to Hadoop client conf/ directories in the local filesystem.
    </description>
    </property>

    该节点是设置hadoop的配置文件目录

    4、

    <!-- Proxyuser Configuration -->

    <property>
    <name>oozie.service.ProxyUserService.proxyuser.#USER#.hosts</name>
    <value>*</value>
    <description>
    List of hosts the '#USER#' user is allowed to perform 'doAs'
    operations.

    The '#USER#' must be replaced with the username o the user who is
    allowed to perform 'doAs' operations.

    The value can be the '*' wildcard or a list of hostnames.

    For multiple users copy this property and replace the user name
    in the property name.
    </description>
    </property>

    <property>
    <name>oozie.service.ProxyUserService.proxyuser.#USER#.groups</name>
    <value>*</value>
    <description>
    List of groups the '#USER#' user is allowed to impersonate users
    from to perform 'doAs' operations.

    The '#USER#' must be replaced with the username o the user who is
    allowed to perform 'doAs' operations.

    The value can be the '*' wildcard or a list of groups.

    For multiple users copy this property and replace the user name
    in the property name.
    </description>
    </property>

    将该两个节点的注释去掉。

    因为用到了mysql数据库,所以需要将mysql的jar包mysql-connector-java-5.1.20.jar拷贝到oozie的lib目录及libtools目录下

    至此,oozie的配置文件修改完毕,下面做一些oozie启动前的准备工作 :

    1、进入到mysql数据库,创建在oozie-site.xml文件中指定的数据库oozie:

        create database oozie;    (创建名称为oozie的数据库)
        grant all privileges on oozie.* to 'root'@'localhost' identified by '123456';    (设置oozie数据库的访问全选,创建用户名为oozie,密码为oozie的用户)
        grant all privileges on oozie.* to 'root'@'%' identified by '123456';    (设置oozie数据库的访问权限)
        FLUSH PRIVILEGES;

    2、在$OOZIE_HOME/bin目录下执行以下命令,生成创建数据库表的脚本:

     sh ooziedb.sh create -sqlfile oozie.sql

    3、执行数据库脚本,生成相关数据库表:

     sh oozie-setup.sh db create -run  -sqlfile oozie.sql

    至此,数据库配置完毕。

    4、生成oozie.war包:

    执行如下命令生成oozie.war包,还是在bin目录下执行:

    sh addtowar.sh -inputwar $OOZIE_HOME/oozie.war -outputwar $OOZIE_HOME/oozie-server/webapps/oozie.war -hadoop 2.2.0 $HADOOP_HOME -extjs ext-2.2.zip

    5、生成的war包可能没有带mysql-connector-java-5.1.20.jar包,所以需要将该jar包也加到war包中去,否则后面启动oozie时会报错。

    6、在$OOZIE_HOME/bin目录下执行(不知道该步骤有啥用)

    sh oozie-setup.sh sharelib create -fs hdfs://hadoop-master:8020 -locallib $OOZIE_HOME/oozie-sharelib-4.0.0-cdh5.2.0-yarn.tar.gz

    (hadoop2多hdfs集群,hdfs://cluster1是core-site.xml中defaultFs名称:)sh oozie-setup.sh sharelib create -fs hdfs://cluster1 -locallib $OOZIE_HOME/oozie-sharelib-4.0.0-cdh5.0.0-beta-2-yarn.tar.gz

    6、启动oozie:

    前台运行oozie:

    sh oozied.sh run

    后台运行oozie:

    sh oozied.sh start

    启动后看看http://hadoop-master:11000/oozie如下:

    7、运行ooize配置:

    运行oozie工作流的目录基本机构如下:

    8、将上面目录文件上传至hdfs目录下,如/user/root/oozie/workflow/oozieTest目录下

    执行oozie命令如下,可以将命令直接放到一个sh文件中,下次直接sh执行即可:

    run_oozie.sh:

    oozie job -oozie http://hadoop-master:11000/oozie -config $1 -D nameNode=hdfs://hadoop-master:8020 -D jobTracker=hadoop-master:8032 -D queueName=root -D frequency=60 -D nolockTime=0 -D start=2013-11-22T10:00Z -D end=2014-08-30T00:00Z -run

    (注:hadoop2如果是单个hdfs集群,则跟上面差不多,但是如果是多个hdfs集群,就不一样了,其中hdfs://cluster1是core-site.xml中的defaultFs名称,而且没有端口号,jobtracker端口是8032,需要按照如下格式写:)

    oozie job -oozie http://hadoop-kf105.jd.com:11000/oozie -config $1 -D nameNode=hdfs://cluster1 -D jobTracker=hadoop-kf100.jd.com:8032 -D frequency=60 -D nolockTime=0 -D start=2013-11-22T10:00Z -D end=2014-08-30T00:00Z -run

    执行一个工作流的话可以这样:sh run_oozie.sh oozieTest/job.properties

    kill_oozie.sh:

    oozie job -oozie http://hadoop-master:11000/oozie -kill $1 

    kill一个工作流使用:sh kill_oozie.sh jobId

    好了,oozie先介绍到这包,以后有什么新内容再继续补充。

  • 相关阅读:
    Ocelot网关
    .Net Configuration配置优先级问题
    FilterContext/HttpContext 获取请求参数
    关于.Net的文件上传问题
    Notion+Zetero文献同步配置
    PyTorch训练模版
    marked ASP.NET 页面对象模型
    转:jQuery设计思想
    tryParse, try/catch(Parse), Convert比较
    CSS3 :nthchild()伪类选择器
  • 原文地址:https://www.cnblogs.com/zhli/p/4823354.html
Copyright © 2011-2022 走看看