zoukankan      html  css  js  c++  java
  • Spark Standalone Mode

    因为Spark与Hadoop是关联的,所以在安装Spark前应该根据已安装的Hadoop版本来选择待安装的Sqark版本,要不然就会报“Server IPC version X cannot communicate with client version Y”的错误。

    我安装的Hadoop版本为Hadoop2.4.0(下载),选择的Spark版本为spark-1.2.0-bin-hadoop2.4.tgz(下载)。要注意的是Spark和Scala存在一定的版本兼容问题,参考我的另一篇博客中记录的问题。

    官方文档:http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts

    Spark依赖与Scala,所以还需要预装Scala,我下载的版本为scala-2.11.5.tgz,配置Scala的环境变量:

    export SCALA_HOME=/opt/scala/scala-2.11.5
    export PATH=$PATH:$SCALA_HOME/bin

    修改后使环境变量生效,查看Scala版本:

    然后配置Spark的环境变量:

    export SPARK_HOME=/opt/spark
    export PATH=$PATH:$SPARK_HOME/bin

    配置后使环境变量修改生效。

    在 ${SPARK_HOME}/conf 目录下做如下操作:

    cp spark-env.sh.template spark-env.sh

    修改 spark-env.sh ,在文件最后添加(视具体配置路径而定):

    export SCALA_HOME=/opt/scala/scala-2.11.5
    export SPARK_MASTER_IP=127.0.0.1
    export SPARK_WORKER_MEMORY=2G
    export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_75
    export HADOOP_HOME=/usr/local/hadoop
    export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

    配置完毕。

    启动Spark前,先启动Hadoop:

    hadoop@tinylcy:/usr/local/hadoop$ sbin/start-all.sh

    然后启动Spark:

    hadoop@tinylcy:/opt/spark$ sbin/start-all.sh 

    切换到Spark的bin目录,进入交互模式:

    hadoop@tinylcy:/opt/spark$ bin/spark-shell

    测试:

    scala> val textFile=sc.textFile("hdfs://localhost:9000/user/hadoop/input/words.txt")

    scala> val count=textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)

    scala> count.collect()

    再举一个例子:

    scala> val data=Array(1,2,3,4,5)  //产生data
    data: Array[Int] = Array(1, 2, 3, 4, 5)
    
    scala> val distData=sc.parallelize(data)  //将data处理成RDD
    distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:14
    
    scala> distData.reduce(_+_)  //在RDD上进行运算,对data里面的元素进行加和
    15/07/19 14:37:56 INFO spark.SparkContext: Starting job: reduce at <console>:17
    15/07/19 14:37:56 INFO scheduler.DAGScheduler: Got job 0 (reduce at <console>:17) with 4 output partitions (allowLocal=false)
    15/07/19 14:37:56 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at <console>:17)
    15/07/19 14:37:56 INFO scheduler.DAGScheduler: Parents of final stage: List()
    15/07/19 14:37:56 INFO scheduler.DAGScheduler: Missing parents: List()
    15/07/19 14:37:56 INFO scheduler.DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize at <console>:14), which has no missing parents
    15/07/19 14:37:56 WARN util.SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
    15/07/19 14:37:56 INFO storage.MemoryStore: ensureFreeSpace(1184) called with curMem=0, maxMem=278019440
    15/07/19 14:37:56 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1184.0 B, free 265.1 MB)
    15/07/19 14:37:56 INFO storage.MemoryStore: ensureFreeSpace(912) called with curMem=1184, maxMem=278019440
    15/07/19 14:37:56 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 912.0 B, free 265.1 MB)
    15/07/19 14:37:56 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:47649 (size: 912.0 B, free: 265.1 MB)
    15/07/19 14:37:56 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
    15/07/19 14:37:56 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838
    15/07/19 14:37:56 INFO scheduler.DAGScheduler: Submitting 4 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at <console>:14)
    15/07/19 14:37:56 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
    15/07/19 14:37:56 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1204 bytes)
    15/07/19 14:37:56 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1204 bytes)
    15/07/19 14:37:56 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 1204 bytes)
    15/07/19 14:37:56 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 1208 bytes)
    15/07/19 14:37:56 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3)
    15/07/19 14:37:56 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
    15/07/19 14:37:56 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)
    15/07/19 14:37:56 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
    15/07/19 14:37:56 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 727 bytes result sent to driver
    15/07/19 14:37:56 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 727 bytes result sent to driver
    15/07/19 14:37:56 INFO executor.Executor: Finished task 3.0 in stage 0.0 (TID 3). 727 bytes result sent to driver
    15/07/19 14:37:56 INFO executor.Executor: Finished task 2.0 in stage 0.0 (TID 2). 727 bytes result sent to driver
    15/07/19 14:37:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 60 ms on localhost (1/4)
    15/07/19 14:37:56 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 56 ms on localhost (2/4)
    15/07/19 14:37:56 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 57 ms on localhost (3/4)
    15/07/19 14:37:56 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 58 ms on localhost (4/4)
    15/07/19 14:37:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
    15/07/19 14:37:56 INFO scheduler.DAGScheduler: Stage 0 (reduce at <console>:17) finished in 0.077 s
    15/07/19 14:37:56 INFO scheduler.DAGScheduler: Job 0 finished: reduce at <console>:17, took 0.342516 s
    res0: Int = 15   //得到运算结果
    
    scala> 
  • 相关阅读:
    Atiitt 关于不可替代性的思索 目录 1.1. 不可替代性与 这份工作谁都能干无关 1 1.2. 不可替代性未必很好,因为其岗位可能很累或者收入很低 1 1.3. 不可替代性与报酬无关 1 2
    PetShop4.0视频讲解 通过简单案例理解petshop4.0的工厂模式
    PetShop4.0视频教程系列 简单实例讲解PetShop4.0的缓存机制
    视频讲解Petshop4.0消息处理概述
    写给自学asp.net的年轻人,特别是大学生
    ASP.NET学习者的必修课——PetShop4.0
    求解【DataBinding:“System.Data.DataRowView”不包含名为“ID”的属性】
    程序员,当你写程序写累了怎么办。
    Limit.exe
    都是�(65533)惹得祸~
  • 原文地址:https://www.cnblogs.com/Murcielago/p/4657501.html
Copyright © 2011-2022 走看看