一 集群规划
使用standalone 模式.18台机器,一台master,17台slave
二 版本
scala-2.11.7.tgz
spark-1.4.1-bin-hadoop2.6.tgz
三 安装
默认hadoop已经安装完成,没有安装的看hadoop安装那篇
3.1 安装scala
$ cd /opt/soft $ tar /home/hadoop/scala-2.11.7.tgz $ mv scala-2.11.7/ scala
3.2 安装spark
$ tar /home/hadoop/spark-1.4.1-bin-hadoop2.6.tgz $ mv spark-1.4.1-bin-hadoop2.6/ spark
3.3 添加环境变量
/etc/profile 增加如下内容
export SCALA_HOME=/opt/soft/scala export SPARK_HOME=/opt/soft/spark export PATH=$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH
四 配置spark
4.1 配置slaves
$ cd /opt/soft/spark/conf $ cp slaves.template slaves $ cat slaves # A Spark Worker will be started on each of the machines listed below. a02 a03 a04 a05 a06 a07 a08 a09 a10 a11 a12 a13 a14 a15 a16 a17 a18
4.2 配置spark-env.sh
$ cp spark-env.sh.template spark-env.sh $ vim spark-env.sh #公共配置 export SCALA_HOME=/opt/soft/scala/ export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/ export SPARK_LOCAL_DIRS=/opt/soft/spark/ export SPARK_CONF_DIR=/opt/soft/spark/conf/ export SPARK_PID_DIR=/opt/spark/pid_file/ #standalone export SPARK_MASTER_IP=a01 export SPARK_MASTER_PORT=7077 #每个Worker进程所需要的CPU核的数目 export SPARK_WORKER_CORES=4 #每个Worker进程所需要的内存大小 export SPARK_WORKER_MEMORY=9g #每个Worker节点上运行Worker进程的数目 export SPARK_WORKER_INSTANCES=6 #work执行任务使用本地磁盘的位置 export SPARK_WORKER_DIR=/opt/spark/local #web ui端口 export SPARK_MASTER_WEBUI_PORT=8099 #Spark History Server配置 export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://a01:9000/user/spark/applicationHistory"
这里配置的是standalone模式,每个配置的值根据具体的机器硬件来配置,但是一定要保证
SPARK_WORKER_CORES * SPARK_WORKER_INSTANCES <= 单台机器cpu总核数
SPARK_WORKER_MEMORY * SPARK_WORKER_INSTANCES <= 单台机器总内存
还有更多配置项可以参考 spark-env.sh.template里
SPARK_HISTORY_OPTS 是配置历史记录,详细的可以参考 http://www.cnblogs.com/luogankun/p/3981645.html
4.3 配置spark-defaults.conf
$ cp spark-defaults.conf.template spark-defaults.conf $ vim spark-defaults.conf #默认使用 standalone 模式 spark.master spark://a01:7077 #Spark History Server 设置 spark.eventLog.enabled true spark.eventLog.dir hdfs://a01:9000/user/spark/applicationHistory
全部配置完成,将spark 重新打包传到slave节点.
slave节点安装先做第三步,再解压刚传过来的spark即可
五 启动
$ /opt/soft/spark/sbin/start-all.sh
查看各个机器上的进程是否都有了
5.1 手工启动worker
在使用start-all.sh启动的时候,有时候会出现个别worker启动失败。或者生产环境中出现有worker下线的情况
这个时候,不想重启整个集群,把这个worker重新启动。
a.先找出失败的worker
这个在webui中可以查找哪台机器上的worker数与实际数不符合的,再去这台机器上查看worker日志就知道是那个worker出问题了,直接kill进程即可
b.重新启动这个worker
使用命令
$SPARK_HOME/sbin/spark-daemon.sh [--config <conf-dir>] (start|stop|status) <spark-command> <spark-instance-number> <args...>
第一个参数 : --config $SPARK_HOME/conf
第二个参数 : start
第三个参数 : org.apache.spark.deploy.worker.Worker(worker类的路径)
第四个参数 : 这个worker的号码,根据机器上已有的worker数来看
第五个参数 : 启动时的参数,下面是源码解析参数类 WorkerArguments.scala 中截取,都很清楚,传自己需要的参数即可
case ("--ip" | "-i") :: value :: tail => Utils.checkHost(value, "ip no longer supported, please use hostname " + value) host = value parse(tail) case ("--host" | "-h") :: value :: tail => Utils.checkHost(value, "Please use hostname " + value) host = value parse(tail) case ("--port" | "-p") :: IntParam(value) :: tail => port = value parse(tail) case ("--cores" | "-c") :: IntParam(value) :: tail => cores = value parse(tail) case ("--memory" | "-m") :: MemoryParam(value) :: tail => memory = value parse(tail) case ("--work-dir" | "-d") :: value :: tail => workDir = value parse(tail) case "--webui-port" :: IntParam(value) :: tail => webUiPort = value parse(tail) case ("--properties-file") :: value :: tail => propertiesFile = value parse(tail)
一个例子
sbin/spark-daemon.sh --config conf/ start org.apache.spark.deploy.worker.Worker 2 --webui-port 8082 -c 4 -m 9G spark://a01:7077
注意 最后master地址是必须要加的
六 jobserver 安装
jobServer依赖sbt,所以必须先装好sbt
rpm -ivh https://dl.bintray.com/sbt/rpm/sbt-0.13.7.rpm
安装git,从git上拉取代码,启动
yum install git # 下面clone这个项目 SHELL$ git clone https://github.com/ooyala/spark-jobserver.git # 在项目根目录下,进入sbt SHELL$ sbt ...... [info] Loading project definition from /home/pingjie/wordspace/spark-jobserver/project > #在本地启动jobServer(开发者模式) >re-start --- -Xmx4g ...... #此时会下载spark-core,jetty和liftweb等相关模块。 job-server-extras Starting spark.jobserver.JobServer.main() [success] Total time: 111 s, completed 2015-9-22 9:59:21
然后访问http://localhost:8090 可以看到Web UI
安装完成
6.2 API
JARS
GET /jars 列出所有上传的jars与上次更新时间
POST /jars/<appName> 查出指定名称appName的jar
Contexts
GET /contexts - 列出当前所有contexts
POST /contexts/<name> - 创建一个新的contexts
DELETE /contexts/<name> - 删除一个contexts,并停止上面所有的任务
Jobs
GET /jobs
查询所有jobPOST /jobs
提交一个新jobGET /jobs/
<jobId>
查询某一任务的结果和状态GET /jobs/<jobId>/config 查询job的配置
DELETE /jobs/<jobId> 删除指定job
6.3 熟悉jobserver的命令
拿job-server-tests测试,先编译打包,命令跟maven很像
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ sbt job-server-tests/package [info] Loading project definition from /home/pingjie/wordspace/spark-jobserver/project Missing bintray credentials /home/pingjie/.bintray/.credentials. Some bintray features depend on this. Missing bintray credentials /home/pingjie/.bintray/.credentials. Some bintray features depend on this. Missing bintray credentials /home/pingjie/.bintray/.credentials. Some bintray features depend on this. Missing bintray credentials /home/pingjie/.bintray/.credentials. Some bintray features depend on this. [info] Set current project to root (in build file:/home/pingjie/wordspace/spark-jobserver/) [info] scalastyle using config /home/pingjie/wordspace/spark-jobserver/scalastyle-config.xml [info] Processed 5 file(s) [info] Found 0 errors [info] Found 0 warnings [info] Found 0 infos [info] Finished in 4 ms [success] created output: /home/pingjie/wordspace/spark-jobserver/job-server-tests/target [warn] Credentials file /home/pingjie/.bintray/.credentials does not exist [info] Updating {file:/home/pingjie/wordspace/spark-jobserver/}job-server-tests... [info] Resolving org.fusesource.jansi#jansi;1.4 ... [info] Done updating. [info] scalastyle using config /home/pingjie/wordspace/spark-jobserver/scalastyle-config.xml [info] Processed 3 file(s) [info] Found 0 errors [info] Found 0 warnings [info] Found 0 infos [info] Finished in 0 ms [success] created output: /home/pingjie/wordspace/spark-jobserver/job-server-api/target [info] Compiling 5 Scala sources to /home/pingjie/wordspace/spark-jobserver/job-server-tests/target/scala-2.10/classes... [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list [info] Packaging /home/pingjie/wordspace/spark-jobserver/job-server-tests/target/scala-2.10/job-server-tests_2.10-0.5.3-SNAPSHOT.jar ... [info] Done packaging. [success] Total time: 41 s, completed 2015-9-22 10:06:19
显示成功,在target目录下已经生成jar包了
#提交一个新的jars
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl --data-binary @job-server-tests/target/scala-2.10/job-server-tests_2.10-0.5.3-SNAPSHOT.jar localhost:8090/jars/test OK
#查看当前所有的jars
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl localhost:8090/jars { "test": "2015-09-22T10:10:29.815+08:00" }
#提交一个新job,不指定context,会默认创建一个contexts
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl -d "input.string= hello job server " 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'
{ "status": "STARTED", "result": { "jobId": "64196fca-80da-4c74-9b6f-27c5954ee25c", "context": "bf196647-spark.jobserver.WordCountExample" } }
#提交一个job,不指定context,会默认创建一个contexts
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl -X POST -d "input.string= hello job server " 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'
{
"status": "STARTED",
"result": {
"jobId": "d09ec0c4-91db-456d-baef-633b5c0ff504",
"context": "7500533c-spark.jobserver.WordCountExample"
}
}
#查看所有job,已经有上面新建的那个了
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl 'localhost:8090/jobs' [{ "duration": "0.715 secs", "classPath": "spark.jobserver.WordCountExample", "startTime": "2015-09-22T10:19:34.591+08:00", "context": "bf196647-spark.jobserver.WordCountExample", "status": "FINISHED", "jobId": "64196fca-80da-4c74-9b6f-27c5954ee25c" }]
#查看所有contexts,现在是空的
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl 'localhost:8090/contexts' []
#新建一个contexts,并指定使用的cpu数与每个work使用的内存 pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl -d "" 'localhost:8090/contexts/test-contexts?num-cpu-cores=1&mem-per-node=512m' OK
#再次查看,已经有刚才新建的context了 pingjie@pingjie-youku:~/wordspace/spark-jobserver$ cur'localhost:8090/contexts' ["test-contexts"]
#提交任务,指定contexts
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl -X POST -d "input.string= hello job server " 'localhost:8090/jobs?appName=test&classPath=spark.jobserve r.WordCountExample&context=test-contexs&sync=true'
{
"status": "OK",
"result": {
"job": 1,
"hello": 1,
"server": 1
}
}
在jobserver上提交一个任务的是顺序应该是
1.提交jar包
2.创建context
3.提交job
也可以创建不创建contexts,可以像上面那样的方式直接提交job,那样就会默认创建一个context,并且会占用jobserver剩下的所有资源.
6.4 配置文件
打开配置文件,可以发现master设置为local[4],可以将其改为我们的集群地址。
vim spark-jobserver/config/local.conf.template master = "local[4]"
此外,关于数据对象的存储方法和路径:
jobdao = spark.jobserver.io.JobFileDAO filedao { rootdir = /tmp/spark-job-server/filedao/data }
默认context设置,该设置可以被
下面再次在sbt中启动REST接口的中的参数覆盖。
# universal context configuration. These settings can be overridden, see README.md context-settings { num-cpu-cores = 2 # 使用的总cpu数. Required. memory-per-node = 512m # 对应spark每个exector节点上使用的内存, -Xmx style eg 512m, #1G, etc. # in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave) # spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz" # uris of jars to be loaded into the classpath for this context # dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"] }
基本的使用到此为止,jobServer的部署和项目使用将之后介绍。
6.5 部署
复制config/local.sh.template到local.sh ,并且设置相关参数。 可以在多个主机上配置jobserver,并指定安装路径,Spark Home, Spark Conf等参数。
# Environment and deploy file # For use with bin/server_deploy, bin/server_package etc. DEPLOY_HOSTS="a01" APP_USER=hadoop APP_GROUP=hadoop # optional SSH Key to login to deploy server #SSH_KEY=/path/to/keyfile.pem INSTALL_DIR=/opt/soft/job-server LOG_DIR=/opt/soft/job-server/logs PIDFILE=spark-jobserver.pid SPARK_HOME=/opt/soft/spark SPARK_CONF_DIR=$SPARK_HOME/conf # Only needed for Mesos deploys #SPARK_EXECUTOR_URI=/usr/spark/spark-1.4.0-bin-hadoop2.4.tgz # Only needed for YARN running outside of the cluster # You will need to COPY these files from your cluster to the remote machine # Normally these are kept on the cluster in /etc/hadoop/conf # YARN_CONF_DIR=/pathToRemoteConf/conf SCALA_VERSION=2.11.7
部署jobserver,需要漫长的等待。为了配置方便,最好配置好ssh互信
6.6 启动
./server_start.sh