Spark standalone简介与运行wordcount(master、slave1和slave2)
(1)spark-env.sh 是环境变量配置文件
(3)slaves 是从节点机器配置文件
(4)metrics.properties 是 监控
(5)log4j.properties 是配置日志
(6)docker.properties 是 docker
(7)我这里的Spark standalone模式的安装,是master、slave1和slave2。
(8)Spark standalone模式的安装,其实,是可以不需安装hadoop的。(我这里是没有安装hadoop了,看到有些人写博客也没安装,也有安装的)
首先,说下我这篇博客的Spark standalone模式的安装情况
DEVICE=eth0 HWADDR=00:0C:29:A9:45:18 TYPE=Ethernet UUID=50fc177a-f282-4c83-bfbc-cb0f00b92507 ONBOOT=yes NM_CONTROLLED=yes BOOTPROTO=static DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=yes IPV6INIT=no NAME="System eth0" IPADDR= BCAST= GATEWAY= NETMASK= DNS1= DNS2=
DEVICE=eth0 HWADDR=00:0C:29:18:ED:4A TYPE=Ethernet UUID=b5d059e4-3b92-41ef-889b-68f2f5684fac ONBOOT=yes NM_CONTROLLED=yes BOOTPROTO=static DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=yes IPV6INIT=no NAME="System eth0" IPADDR= BCAST= GATEWAY= NETMASK= DNS1= DNS2=
DEVICE=eth0 HWADDR=00:0C:29:8B:DE:B0 TYPE=Ethernet UUID=1ba7be29-2c80-4875-8c11-1ed2a47c0a67 ONBOOT=yes NM_CONTROLLED=yes BOOTPROTO=static DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=yes IPV6INIT=no NAME="System eth0" IPADDR= BCAST= GATEWAY= NETMASK= DNS1= DNS1=
我的jdk是安装在/usr/local/jdk下,记得赋予权限组,chown -R spark:spark jdk
#java export JAVA_HOME=/usr/local/jdk/jdk1.8.0_60 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib export PATH=$PATH:$JAVA_HOME/bin
我的scala安装在/usr/local/scala,记得赋予用户组,chown -R spark:spark scala
#scala export SCALA_HOME=/usr/local/scala/scala-2.10.5 export PATH=$PATH:$SCALA_HOME/bin
我的spark安装目录是在/usr/local/spark/,记得赋予用户组,chown -R spark:spark sparl
只需去下面的博客,去看如何安装就好,至于spark的怎么配置。请见下面的spark standalone模式的配置文件讲解。
#spark export SPARK_HOME=/usr/local/spark/spark-1.6.1-bin-hadoop2.6 export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
以及,之后,在spark 里怎么配置zookeeper。
● slaves--指定在哪些节点上运行worker。
# # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # A Spark Worker will be started on each of the machines listed below. slave1 slave2
● spark-defaults.conf---spark提交job时的默认配置
# # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Default system properties included when running spark-submit. # This is useful for setting default environmental settings. # Example: # spark.master spark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializer org.apache.spark.serializer.KryoSerializer # spark.driver.memory 5g # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark-defaults.conf (这个作为可选可不选)(是因为或者是在spark-submit里也是可以加入的)(一般不选,不然固定死了)(我一般是这里不配置,即这个文件不动它)
spark.master spark://master:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:9000/sparkHistoryLogs
spark.eventLog.compress true
spark.history.fs.update.interval 5
spark.history.ui.port 7777
spark.history.fs.logDirectory hdfs://master:9000/sparkHistoryLogs
● spark-env.sh—spark的环境变量
#!/usr/bin/env bash
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_60
export SCALA_HOME=/usr/local/scala/scala-2.10.5
export SPARK_WORKER_MERMORY=1G (官网上说是1g)
# SPARK_MASTER_WEBUI_PORT=8888 (这里自行可以去修改,我这里不做演示)
● 在打算作为master的节点上启动集群—sbin/start-all.sh