zoukankan      html  css  js  c++  java
  • 【原创】大数据基础之Spark(9)spark部署方式yarn/mesos

    1 下载解压 https://spark.apache.org/downloads.html

    $ wget http://mirrors.shu.edu.cn/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

    $ tar xvf spark-2.4.0-bin-hadoop2.7.tgz
    $ cd spark-2.4.0-bin-hadoop2.7

    2 配置环境变量SPARK_HOME

    $ export SPARK_HOME=/path/to/spark-2.4.0-bin-hadoop2.7

    3 启动

    以spark-sql为例

    3.1 spark on yarn

    3.1.1 环境

    只需要配置环境变量 HADOOP_CONF_DIR

    3.1.2 启动

    $ bin/spark-sql --master yarn

    更多参数

    --deploy-mode cluster
    --driver-memory 4g
    --driver-cores 1
    --executor-memory 2g
    --executor-cores 1
    --num-executors 1
    --queue thequeue

    注意:spark on yarn 有可能启动报错

    19/02/25 17:54:20 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!

    查看nodemanager日志发现原因

    2019-02-25 17:54:19,481 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=48342,containerID=container_1551078668160_0012_02_000001] is running beyond virtual memory limits. Current usage: 380.9 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.

    需要调整yarn-site.xml配置

        <property>

            <name>yarn.nodemanager.vmem-check-enabled</name>

            <value>false</value>

        </property>

     or

        <property>

            <name>yarn.nodemanager.vmem-pmem-ratio</name>

            <value>4</value>

        </property>

    3.1.3 日志

    查看日志

    # yarn logs -applicationId=$application_id

    本地日志目录:/var/log/hadoop-yarn/userlogs/$application_id

    3.1.4 stop

    # yarn application -kill $application_id

    停止yarn上运行的application

    3.2 spark on mesos

    3.2.1 环境

    配置环境变量

    export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so

    配置spark包或目录,二选一

    1)方式一

    export SPARK_EXECUTOR_URI=<URL of spark-2.4.3.tar.gz uploaded above>

    or

    --conf spark.executor.uri=<URL of spark-2.4.3.tar.gz uploaded above>

    2)方式二

    --conf spark.mesos.executor.home=/path/to/spark/home

    说明:

    To use Mesos from Spark, you need a Spark binary package available in a place accessible by Mesos, and a Spark driver program configured to connect to Mesos.

    Alternatively, you can also install Spark in the same location in all the Mesos slaves, and configure spark.mesos.executor.home (defaults to SPARK_HOME) to point to that location.

    3.2.2 deploy-mode

    3.2.2.1 client方式

    启动

    $ bin/spark-sql --master mesos://zk://192.168.0.1:2181,192.168.0.2:2181/mesos

    其中master参数两者选其一

    --master mesos://zk://192.168.0.1:2181/mesos
    --master mesos://192.168.0.1:5050

    更多参数

    --supervise
    --executor-memory 20G
    --conf spark.executor.cores=1
    --conf spark.cores.max=100

    注意此时没有--num-executors参数(yarn),也不能用--executor-cores,间接配置方法如下:

    Executor memory: spark.executor.memory
    Executor cores: spark.executor.cores
    Number of executors: spark.cores.max/spark.executor.cores

    3.2.2.2 cluster方式

    To use cluster mode, you must start the MesosClusterDispatcher in your cluster via the sbin/start-mesos-dispatcher.sh script, passing in the Mesos master URL (e.g: mesos://host:5050). This starts the MesosClusterDispatcher as a daemon running on the host. Note that the MesosClusterDispatcher does not support authentication. You should ensure that all network access to it is protected (port 7077 by default).

    启动mesos dispatcher

    $SPARK_HOME/sbin/start-mesos-dispatcher.sh --master mesos://zk://192.168.0.1:2181/mesos

    修改master

    --master mesos://192.168.0.1:7077

    增加conf参数

    --conf spark.master.rest.enabled=true

    如果不加上述conf直接使用cluster方式提交任务会报错:

    Exception in thread "main" java.lang.AssertionError: assertion failed: Mesos cluster mode is only supported through the REST submission API
            at scala.Predef$.assert(Predef.scala:170)
            at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:673)
            at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
            at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
            at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
            at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

    相关代码如下:

    org.apache.spark.deploy.SparkSubmit

        // In standalone cluster mode, there are two submission gateways:
        //   (1) The traditional RPC gateway using o.a.s.deploy.Client as a wrapper
        //   (2) The new REST-based gateway introduced in Spark 1.3
        // The latter is the default behavior as of Spark 1.3, but Spark submit will fail over
        // to use the legacy gateway if the master endpoint turns out to be not a REST server.
        if (args.isStandaloneCluster && args.useRest) {
          try {
            logInfo("Running Spark using the REST application submission protocol.")
            doRunMain()
          } catch {
            // Fail over to use the legacy submission gateway
            case e: SubmitRestConnectionException =>
              logWarning(s"Master endpoint ${args.master} was not a REST server. " +
                "Falling back to legacy submission gateway instead.")
              args.useRest = false
              submit(args, false)
          }
        // In all other modes, just run the main class as prepared
        } else {
          doRunMain()
        }
      }

    其中useRest来自这里:

    org.apache.spark.deploy.SparkSubmitArguments

      useRest = sparkProperties.getOrElse("spark.master.rest.enabled", "false").toBoolean

    最后要保证jar包可以通过http或hdfs访问

    Note that jars or python files that are passed to spark-submit should be URIs reachable by Mesos slaves, as the Spark driver doesn’t automatically upload local jars.

    其中mesos dispatcher为单点,不支持ha,支持在marathon上运行;

    3.2.3 日志

    本地日志目录:/var/lib/mesos/slaves

    示例:

    # ls -l /var/lib/mesos/slaves/cbe75da3-d16c-43b0-8949-f77cd2be2591-S0/frameworks/2a0fb98b-f8df-44e8-965c-54ad7203fa45-0010/executors/driver-20190614142804-0001/runs/3e59a486-a219-4f63-a41e-12fb064a597d/
    total 2460364
    drwxr-xr-x 66 root root       4096 Jun 14 14:30 blockmgr-831b7a79-3224-479d-a927-b9024540749d
    drwx------  3 root root       4096 Jun 14 14:28 spark-e5468d46-51be-478b-8968-fb0700953ea8
    -rw-r--r--  1 root root 2512477348 Jun 18 14:03 stderr
    -rw-r--r--  1 root root    6720547 Jun 18 14:03 stdout

    3.2.4 stop

    # curl http://localhost:5050/master/teardown -H 'Content-Type: application/json' -d "frameworkId=2a0fb98b-f8df-44e8-965c-54ad7203fa45-0010" -v

    停止mesos上运行的framework/task

    参考:http://spark.apache.org/docs/latest/running-on-mesos.html

  • 相关阅读:
    Java实现 LeetCode 833 字符串中的查找与替换(暴力模拟)
    Java实现 LeetCode 833 字符串中的查找与替换(暴力模拟)
    Java实现 LeetCode 833 字符串中的查找与替换(暴力模拟)
    Java实现 LeetCode 832 翻转图像(位运算)
    Java实现 LeetCode 832 翻转图像(位运算)
    Java实现 LeetCode 832 翻转图像(位运算)
    Java实现 LeetCode 831 隐藏个人信息(暴力)
    Java实现 LeetCode 831 隐藏个人信息(暴力)
    Java实现 LeetCode 831 隐藏个人信息(暴力)
    how to use automapper in c#, from cf~
  • 原文地址:https://www.cnblogs.com/barneywill/p/10432581.html
Copyright © 2011-2022 走看看