【原创】大数据基础之Spark（9）spark部署方式yarn/mesos

zoukankan html css js c++ java

【原创】大数据基础之Spark（9）spark部署方式yarn/mesos
1 下载解压 https://spark.apache.org/downloads.html

$ wget http://mirrors.shu.edu.cn/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

$ tar xvf spark-2.4.0-bin-hadoop2.7.tgz
$ cd spark-2.4.0-bin-hadoop2.7

2 配置环境变量SPARK_HOME

$ export SPARK_HOME=/path/to/spark-2.4.0-bin-hadoop2.7

3 启动

以spark-sql为例

3.1 spark on yarn

3.1.1 环境

只需要配置环境变量 HADOOP_CONF_DIR

3.1.2 启动

$ bin/spark-sql --master yarn

更多参数

--deploy-mode cluster
--driver-memory 4g
--driver-cores 1
--executor-memory 2g
--executor-cores 1
--num-executors 1
--queue thequeue

注意：spark on yarn 有可能启动报错

19/02/25 17:54:20 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!

查看nodemanager日志发现原因

2019-02-25 17:54:19,481 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=48342,containerID=container_1551078668160_0012_02_000001] is running beyond virtual memory limits. Current usage: 380.9 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.

需要调整yarn-site.xml配置

    <property>

        <name>yarn.nodemanager.vmem-check-enabled</name>

        <value>false</value>

    </property>

or

    <property>

        <name>yarn.nodemanager.vmem-pmem-ratio</name>

        <value>4</value>

    </property>

3.1.3 日志

查看日志

# yarn logs -applicationId=$application_id

本地日志目录：/var/log/hadoop-yarn/userlogs/$application_id

3.1.4 stop

# yarn application -kill $application_id

停止yarn上运行的application

3.2 spark on mesos

3.2.1 环境

配置环境变量

export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so

配置spark包或目录，二选一

1）方式一

export SPARK_EXECUTOR_URI=<URL of spark-2.4.3.tar.gz uploaded above>

or

--conf spark.executor.uri=<URL of spark-2.4.3.tar.gz uploaded above>

2）方式二

--conf spark.mesos.executor.home=/path/to/spark/home

说明：

To use Mesos from Spark, you need a Spark binary package available in a place accessible by Mesos, and a Spark driver program configured to connect to Mesos.

Alternatively, you can also install Spark in the same location in all the Mesos slaves, and configure spark.mesos.executor.home (defaults to SPARK_HOME) to point to that location.

3.2.2 deploy-mode

3.2.2.1 client方式

启动

$ bin/spark-sql --master mesos://zk://192.168.0.1:2181,192.168.0.2:2181/mesos

其中master参数两者选其一

--master mesos://zk://192.168.0.1:2181/mesos
--master mesos://192.168.0.1:5050

更多参数

--supervise
--executor-memory 20G
--conf spark.executor.cores=1
--conf spark.cores.max=100

注意此时没有--num-executors参数（yarn），也不能用--executor-cores，间接配置方法如下：

Executor memory: spark.executor.memory
Executor cores: spark.executor.cores
Number of executors: spark.cores.max/spark.executor.cores

3.2.2.2 cluster方式

To use cluster mode, you must start the MesosClusterDispatcher in your cluster via the sbin/start-mesos-dispatcher.sh script, passing in the Mesos master URL (e.g: mesos://host:5050). This starts the MesosClusterDispatcher as a daemon running on the host. Note that the MesosClusterDispatcher does not support authentication. You should ensure that all network access to it is protected (port 7077 by default).

启动mesos dispatcher

$SPARK_HOME/sbin/start-mesos-dispatcher.sh --master mesos://zk://192.168.0.1:2181/mesos

修改master

--master mesos://192.168.0.1:7077

增加conf参数

--conf spark.master.rest.enabled=true

如果不加上述conf直接使用cluster方式提交任务会报错：
Exception in thread "main" java.lang.AssertionError: assertion failed: Mesos cluster mode is only supported through the REST submission API at scala.Predef$.assert(Predef.scala:170) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:673) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
相关代码如下：

org.apache.spark.deploy.SparkSubmit
// In standalone cluster mode, there are two submission gateways: // (1) The traditional RPC gateway using o.a.s.deploy.Client as a wrapper // (2) The new REST-based gateway introduced in Spark 1.3 // The latter is the default behavior as of Spark 1.3, but Spark submit will fail over // to use the legacy gateway if the master endpoint turns out to be not a REST server. if (args.isStandaloneCluster && args.useRest) { try { logInfo("Running Spark using the REST application submission protocol.") doRunMain() } catch { // Fail over to use the legacy submission gateway case e: SubmitRestConnectionException => logWarning(s"Master endpoint ${args.master} was not a REST server. " + "Falling back to legacy submission gateway instead.") args.useRest = false submit(args, false) } // In all other modes, just run the main class as prepared } else { doRunMain() } }
其中useRest来自这里：

org.apache.spark.deploy.SparkSubmitArguments
useRest = sparkProperties.getOrElse("spark.master.rest.enabled", "false").toBoolean
最后要保证jar包可以通过http或hdfs访问

Note that jars or python files that are passed to spark-submit should be URIs reachable by Mesos slaves, as the Spark driver doesn’t automatically upload local jars.

其中mesos dispatcher为单点，不支持ha，支持在marathon上运行；

3.2.3 日志

本地日志目录：/var/lib/mesos/slaves

示例：
# ls -l /var/lib/mesos/slaves/cbe75da3-d16c-43b0-8949-f77cd2be2591-S0/frameworks/2a0fb98b-f8df-44e8-965c-54ad7203fa45-0010/executors/driver-20190614142804-0001/runs/3e59a486-a219-4f63-a41e-12fb064a597d/ total 2460364 drwxr-xr-x 66 root root 4096 Jun 14 14:30 blockmgr-831b7a79-3224-479d-a927-b9024540749d drwx------ 3 root root 4096 Jun 14 14:28 spark-e5468d46-51be-478b-8968-fb0700953ea8 -rw-r--r-- 1 root root 2512477348 Jun 18 14:03 stderr -rw-r--r-- 1 root root 6720547 Jun 18 14:03 stdout
3.2.4 stop

# curl http://localhost:5050/master/teardown -H 'Content-Type: application/json' -d "frameworkId=2a0fb98b-f8df-44e8-965c-54ad7203fa45-0010" -v

停止mesos上运行的framework/task

参考：http://spark.apache.org/docs/latest/running-on-mesos.html
查看全文

相关阅读:
六：页面优化
 五：title,keywords,description标签
 专题之一:开篇有益
 通过JavaScript以及ActiveX控件获得客户端的机器名[Z]
[1]Web Service简介
 （转载）虚拟环境中的隐蔽信道（续）
虚拟机VMware如何能将屏幕调大
 VMware下Ubuntu上网设置
 基于CPU负载的隐蔽信道 ——> ——>基于网络负载的隐蔽信道
 Ubuntu 12.04终端Terminal快捷方式调用

原文地址：https://www.cnblogs.com/barneywill/p/10432581.html

【原创】大数据基础之Spark（9）spark部署方式yarn/mesos

1 下载解压 https://spark.apache.org/downloads.html

2 配置环境变量SPARK_HOME

3 启动

3.1 spark on yarn

3.1.1 环境

3.1.2 启动

3.1.3 日志

3.1.4 stop

3.2 spark on mesos

3.2.1 环境

3.2.2 deploy-mode

3.2.2.1 client方式

3.2.2.2 cluster方式

3.2.3 日志

3.2.4 stop