zoukankan      html  css  js  c++  java
  • spark搭建部署

    基础环境准备

    Spark local模式

    上传编译完成的spark安装程序到服务器上,并解压到指定目录

    [root@hadoop01 soft]# tar zxvf spark-2.2.0-bin-2.6.0-cdh5.14.0.tgz -C /usr/local/
    [root@hadoop01 soft]# cd /usr/local/
    [root@hadoop01 local]# mv spark-2.2.0-bin-2.6.0-cdh5.14.0/ spark
    [root@hadoop01 local]# cd spark/conf/

    修啊改配置文件spark-env.sh.template为spark-env.sh

    [root@hadoop01 conf]# mv spark-env.sh.template spark-env.sh

    启动验证spark程序

    [root@hadoop01 conf]# cd ../
    [root@hadoop01 spark]# ./bin/spark-shell
    [root@hadoop01 conf]# cd ../
    [root@hadoop01 spark]# ./bin/spark-shell

    退出spark命令行

    scala> :quit

    执行jar计算圆周率

    bin/spark-submit 
    --class org.apache.spark.examples.SparkPi 
    --master local[2] 
    --executor-memory 1G 
    --total-executor-cores 2 
    /usr/local/spark/examples/jars/spark-examples_2.11-2.2.0.jar 
    100

    spark的standAlone模式

    修改配置文件spark-env.sh,添加下列内容

    [root@hadoop01 conf]# vim spark-env.sh
    export JAVA_HOME=/usr/local/java/jdk1.8.0_201
    export SPARK_MASTER_HOST=node01
    export SPARK_MASTER_PORT=7077
    export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=4000 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://node01:9000/spark_log"

    修改slaves文件,添加下列内容

    [root@hadoop01 conf]# mv slaves.template slaves
    [root@hadoop01 conf]# vim slaves
    node01
    node02
    node03

    修改配置文件spark-default.conf,添加下列内容

    [root@hadoop01 conf]# mv spark-defaults.conf.template spark-defaults.conf
    [root@hadoop01 conf]# vim spark-defaults.conf
    spark.eventLog.enabled true
    spark.eventLog.dir hdfs://node01:9000/spark_log
    spark.eventLog.compress true

    并在Hadoop的hdfs中创建spark日志目录spark_log

    [root@hadoop01 conf]# hdfs dfs -ls /

    分发配置好的spark程序到其它两台服务器上

    [root@hadoop01 conf]# cd ../../
    [root@hadoop01 conf]# scp -r spark root@node02:$PWD
    [root@hadoop01 conf]# scp -r spark root@node03:$PWD

    启动spark程序

    [root@hadoop01 conf]# cd spark
    [root@hadoop01 conf]# sbin/start-all.sh
    [root@hadoop01 conf]# sbin/start-history-server.sh

    spark的standAlone模式验证

    [root@hadoop01 spark]# bin/spark-submit 
    --class org.apache.spark.examples.SparkPi 
    --master spark://node01:7077 
    --executor-memory 1G 
    --total-executor-cores 1 
    /usr/local/spark/examples/jars/spark-examples_2.11-2.2.0.jar 
    100

    spark的HA高可用模式

    停止启动的spark程序

    [root@hadoop01 spark]# sbin/stop-all.sh
    [root@hadoop01 spark]# sbin/stop-history-server.sh

    解压spark程序到指定目录,并重命名为spark-HA

    [root@hadoop01 soft]# tar zxvf spark-2.2.0-bin-2.6.0-cdh5.14.0.tgz -C /usr/local/
    [root@hadoop01 soft]# cd /usr/local/
    [root@hadoop01 local]# mv spark-2.2.0-bin-2.6.0-cdh5.14.0/ spark-HA

    修改spark配置文件spark-env.sh

    [root@hadoop01 local]# cd spark-HA/conf/
    [root@hadoop01 conf]# mv spark-env.sh.template spark-env.sh
    [root@hadoop01 conf]# vim spark-env.sh
    export JAVA_HOME=/usr/local/java/jdk1.8.0_201
    export SPARK_MASTER_PORT=7077
    export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=4000 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://node01:9000/spark_log"
    export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node01:2181,node02:2181,node03:2181 -Dspark.deploy.zookeeper.dir=/spark"

    修改配置文件slave

    [root@hadoop01 conf]# mv slaves.template slaves
    [root@hadoop01 conf]# vim slaves
    node01
    node02
    node03

    修改配置文件spark-defaults.conf

    [root@hadoop01 conf]# mv spark-defaults.conf.template spark-defaults.conf
    [root@hadoop01 conf]# vim spark-defaults.conf
    spark.eventLog.enabled true
    spark.eventLog.dir hdfs://node01:9000/spark_log
    spark.eventLog.compress true

    在hadoop的hdfs上常见spark日志目录

    [root@hadoop01 conf]# hdfs dfs -mkdir -p /spark_log

    分发spark-HA程序到其他两台服务器上

    启动spark高可用集群

    [root@hadoop01 conf]# cd /usr/local/spark-HA/
    [root@hadoop01 spark-HA]# sbin/start-all.sh
    [root@hadoop01 spark-HA]# sbin/start-history-server.sh

    node02服务器启动master节点

    [root@hadoop01 spark-HA]# cd /usr/local/spark-HA/
    [root@hadoop01 spark-HA]# sbin/start-master.sh

    验证spark高可用集群

    spark的HA模式下的spark的命令行

    [root@hadoop01 spark-HA]# bin/spark-shell --master spark://node01:7077,node02:7077

    运行jar包进行验证测试

    [root@hadoop01 spark-HA]# bin/spark-submit 
    --class org.apache.spark.examples.SparkPi 
    --master spark://node01:7077,node02:7077 
    --executor-memory 1G 
    --total-executor-cores 1 
    /usr/local/spark-HA/examples/jars/spark-examples_2.11-2.2.0.jar 
    100

    spark的on yarn模式

    小提示:如果yarn集群资源不够,我们可以在yarn-site.xml当中添加以下两个配置,然后重启yarn集群,跳过yarn集群资源的检查

    [root@hadoop01 conf]# scp -r spark-HA root@node02:$PWD
    [root@hadoop01 conf]# scp -r spark-HA root@node03:$PWD
    <property>
    
    <name> yarn.nodemanager.pmem-check-enabled</name
    
    <value>false</value>
    
    </property>
    
    <property>
    
    <name> yarn.nodemanager.vmem-check-enabled</name
    
    <value>false</value>
    
    </property>

    修改配置文件spark-env.sh

    HADOOP_CONF_DIR=/usr/local/hadoop-HA/etc/hadoop
    YARN_CONF_DIR=/usr/local/hadoop-HA/etc/hadoop

    提交任务到yarn集群上进行验证

    [root@hadoop01 spark-HA]# bin/spark-submit 
    --class org.apache.spark.examples.SparkPi 
    --master yarn 
    --deploy-mode client 
    /usr/local/spark-HA/examples/jars/spark-examples_2.11-2.2.0.jar 
    100
  • 相关阅读:
    条件语句的用法
    PHP取得当前文档所在的目录
    郁闷,一个语句调试很久
    PHP图片上传加水印(转)
    PHP多行多列分页
    ASP得到当前文件所在目录
    “树人杯”暨第三届辽宁科技大学校园程序设计竞赛正赛D IP检测(绿)
    “树人杯”暨第三届辽宁科技大学校园程序设计竞赛正赛E 成绩统计图(红)
    [面试备] 暴搜 or 二分图的经典升级 : hdu 1045 Fire Net 示例 [ 讲解之用 ]
    《C++ Primer》 第04章 [ 数组和指针 ]
  • 原文地址:https://www.cnblogs.com/starzy/p/10481211.html
Copyright © 2011-2022 走看看