zoukankan      html  css  js  c++  java
  • 最新版spark1.1.0集群安装配置

    和分布式文件系统和NoSQL数据库相比而言,spark集群的安装配置还算是比较简单的:

    很多教程提到要安装java和scala,但我发现spark最新版本是包含scala的,JRE采用linux内嵌的版本也是可以的!

    1. 在主节点(bluejoe0)上安装spark1.1.0:
      wget http://mirror.bit.edu.cn/apache/spark/spark-1.1.0/spark-1.1.0-bin-hadoop2.3.tgz
      tar -zxvf spark-1.1.0-bin-hadoop2.3.tgz
      ln -s spark-1.1.0-bin-hadoop2.3 spark
    2. 启动spark-shell:
      cd /usr/local/spark/bin
      ./spark-shell
      可以看到spark已经自带了scala 2.10:
    3. 输入测试程序:
      scala> val data = Array(1, 2, 3, 4, 5)
      data: Array[Int] = Array(1, 2, 3, 4, 5)


      scala> val distData = sc.parallelize(data)
      distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:14


      scala> distData.reduce(_+_)
    4. 可以观察4040端口:

    5. 也可以测试PI的计算:
      ./bin/run-example SparkPi
      14/11/23 16:08:25 INFO SparkContext: Job finished: reduce at SparkPi.scala:35, took 1.008332384 s
      Pi is roughly 3.1403
    6. 也可以采用spark-submit来提交任务:
      ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[6] /usr/local/spark/lib/spark-examples-1.1.0-hadoop2.3.0.jar 1000
      14/11/23 16:07:30 INFO SparkContext: Job finished: reduce at SparkPi.scala:35, took 46.220537186 s
      Pi is roughly 3.14172056
    7. 现在安装几个从节点,scp spark.tgz文件到其它节点,如:bluejoe4,bluejoe5,bluejoe9
    8. 注意设置好ssh无密码登录;
    9. 修改conf/slaves
      # A Spark Worker will be started on each of the machines listed below.
      bluejoe4
      bluejoe5
      bluejoe9
    10. 在bluejoe0上启动spark集群:
      ./sbin/start-all.sh
      此时可以在浏览器上观察到3个从节点的情况:
    11. 再测试在集群上计算PI的程序:
      ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://bluejoe0:7077 /usr/local/spark/lib/spark-examples-1.1.0-hadoop2.3.0.jar 1000
      14/11/23 16:05:00 INFO SparkContext: Job finished: reduce at SparkPi.scala:35, took 26.322514766 s
      Pi is roughly 3.14159516

      此时观察浏览器的显示:


  • 相关阅读:
    C++ STL——list
    C++ STL——deque
    C++ STL——string和vector
    C++ STL——C++容器的共性和相关概念
    C++ STL——输入输出流
    C++ STL——异常
    C++ STL——类型转换
    C++ STL——模板
    使用PYTHON统计项目代码行数
    在Ubuntu 16.04 LTS下编译安装OpenCV 4.1.1
  • 原文地址:https://www.cnblogs.com/bluejoe/p/5115916.html
Copyright © 2011-2022 走看看