zoukankan      html  css  js  c++  java
  • 【原创】大叔经验分享(5)oozie提交spark任务如何添加依赖

    spark任务添加依赖的方式:

    1 如果是local方式运行,可以通过--jars来添加依赖;

    2 如果是yarn方式运行,可以通过spark.yarn.jars来添加依赖;

    这两种方式在oozie上都行不通,首先oozie上没办法也不应该通过local运行,其次通过spark.yarn.jars方式配置你会发现根本不会生效,来看为什么

    查看LauncherMapper的日志

    Spark Version 2.1.1

    Spark Action Main class        : org.apache.spark.deploy.SparkSubmit

    Oozie Spark action configuration

    =================================================================

    ...

                        --conf

                        spark.yarn.jars=hdfs://hdfs_name/jarpath/*.jar

                        --conf

                        spark.yarn.jars=hdfs://hdfs_name/oozie/share/lib_20180801121138/spark/spark-yarn_2.11-2.1.1.jar

    可见oozie会自己添加一个新的spark.yarn.jars配置,如果提供两个相同的key,spark会如何处理

    org.apache.spark.deploy.SparkSubmit

        val appArgs = new SparkSubmitArguments(args)

    org.apache.spark.launcher.SparkSubmitOptionParser

            if (!handle(name, value)) {

    org.apache.spark.deploy.SparkSubmitArguments

      override protected def handle(opt: String, value: String): Boolean = {

      ...

          case CONF =>

            value.split("=", 2).toSeq match {

              case Seq(k, v) => sparkProperties(k) = v

              case _ => SparkSubmit.printErrorAndExit(s"Spark config without '=': $value")

            }

    可见会直接覆盖,使用最后一个配置,即oozie的配置,而不是应用自己提供的配置,这样就需要应用自己将特殊依赖打包到应用jar中,具体使用maven的maven-assembly-plugin,配置其中的<dependencySets><dependencySet><includes><include>,详细配置如下:

    <assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"

              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

              xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">

        <!-- TODO: a jarjar format would be better -->

        <id>jar-with-dependencies</id>

        <formats>

            <format>jar</format>

        </formats>

        <includeBaseDirectory>false</includeBaseDirectory>

        <dependencySets>

            <dependencySet>

                <outputDirectory>/</outputDirectory>

                <useProjectArtifact>true</useProjectArtifact>

                <unpack>true</unpack>

                <scope>runtime</scope>

                <includes>

                    <include>redis.clients:jedis</include>

                    <include>org.apache.commons:commons-pool2</include>

                </includes>

            </dependencySet>

        </dependencySets>

    </assembly>

    这里只是将默认提供的jar-with-dependencies.xml内容拷贝出来添加includes配置;

  • 相关阅读:
    launch edge和latch edge延迟
    FPGA中如何对管脚输入输出信号进行处理?
    黑盒、白盒、灰盒测试的基本概念
    cadence学习三----->焊盘设计
    cadence学习二----->Allegro基本概念
    cadence学习一------>介绍
    zynq DMA控制器
    AXI_DMA IP学习
    AXI4 STREAM DATA FIFO
    The base and high address of the custom IP are not correctly reflected in xparameters.h in SDK
  • 原文地址:https://www.cnblogs.com/barneywill/p/10109352.html
Copyright © 2011-2022 走看看