zoukankan      html  css  js  c++  java
  • spark安装

    1、下载地址 http://spark.apache.org/downloads.html

    2、解压

    tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz -C /opt/module/

    3、本地模式运行第一个程

    bin/spark-submit --class org.apache.spark.examples.SparkPi --executor-memory 1G --total-executor-cores 2 ./examples/jars/spark-examples_2.11-2.4.4.jar 200
    ... ...
    19/09/05 11:13:27 INFO Executor: Running task 198.0 in stage 0.0 (TID 198)
    19/09/05 11:13:27 INFO Executor: Finished task 198.0 in stage 0.0 (TID 198). 824 bytes result sent to driver
    19/09/05 11:13:27 INFO TaskSetManager: Starting task 199.0 in stage 0.0 (TID 199, localhost, executor driver, partition 199, PROCESS_LOCAL, 7866 bytes)
    19/09/05 11:13:27 INFO TaskSetManager: Finished task 198.0 in stage 0.0 (TID 198) in 6 ms on localhost (executor driver) (199/200)
    19/09/05 11:13:27 INFO Executor: Running task 199.0 in stage 0.0 (TID 199)
    19/09/05 11:13:27 INFO Executor: Finished task 199.0 in stage 0.0 (TID 199). 781 bytes result sent to driver
    19/09/05 11:13:27 INFO TaskSetManager: Finished task 199.0 in stage 0.0 (TID 199) in 9 ms on localhost (executor driver) (200/200)
    19/09/05 11:13:27 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
    19/09/05 11:13:27 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 3.129 s
    19/09/05 11:13:27 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 3.262553 s
    Pi is roughly 3.1416157570807877
    19/09/05 11:13:27 INFO SparkUI: Stopped Spark web UI at http://vmhome10.com:4040
    19/09/05 11:13:27 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    19/09/05 11:13:27 INFO MemoryStore: MemoryStore cleared
    19/09/05 11:13:27 INFO BlockManager: BlockManager stopped
    19/09/05 11:13:27 INFO BlockManagerMaster: BlockManagerMaster stopped
    19/09/05 11:13:27 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    19/09/05 11:13:27 INFO SparkContext: Successfully stopped SparkContext
    19/09/05 11:13:27 INFO ShutdownHookManager: Shutdown hook called
    19/09/05 11:13:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-7a49f112-3630-4ef6-b4dc-1c46af32c133
    19/09/05 11:13:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-6ee58588-7298-4623-b10b-6310e628060d

    参数说明:

    ./bin/spark-submit 
    --class <main-class>
    --master <master-url> 
    --deploy-mode <deploy-mode> 
    --conf <key>=<value> 
    ... # other options
    <application-jar> 
    [application-arguments]
    参数说明:
    --master spark://vmhome10.com:7077 指定Master的地址
    --class: 你的应用的启动类 (如 org.apache.spark.examples.SparkPi)
    --deploy-mode: 是否发布你的驱动到worker节点(cluster) 或者作为一个本地客户端 (client) (default: client)*
    --conf: 任意的Spark配置属性, 格式key=value. 如果值包含空格,可以加引号“key=value” 
    application-jar: 打包好的应用jar,包含依赖. 这个URL在集群中全局可见。 比如hdfs:// 共享存储系统, 如果是 file:// path, 那么所有的节点的path都包含同样的jar
    application-arguments: 传给main()方法的参数
    --executor-memory 1G 指定每个executor可用内存为1G
    --total-executor-cores 2 指定每个executor使用的cup核数为2个

    4、进入shell编程模式

    bin/spark-shell
    19/09/05 11:42:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://vmhome10.com:4040
    Spark context available as 'sc' (master = local[*], app id = local-1567654930914).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_   version 2.4.4
          /_/
             
    Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
    Type in expressions to have them evaluated.
    Type :help for more information.

     如果启动spark shell时没有指定master地址,但是也可以正常启动spark shell和执行spark shell中的程序,其实是启动了spark的local模式,该模式仅在本机启动一个进程,没有与集群建立联系.

    带参数启动shell:

    bin/spark-shell 
    --master spark://vmhome10.com:7077 
    --executor-memory 1g 
    --total-executor-cores 2

    Spark Shell中已经默认将SparkContext类初始化为对象sc。用户代码如果需要用到,则直接应用sc即可,  sparksession  是sparksql

    在shell中执行wordcount。

    scala> sc.textFile("/home/hadoop/1.txt").flatMap(_.split(",")).map((_,1)).reduceByKey(_+_).collect
    res2: Array[(String, Int)] = Array((192.168.1.1,2), (mytest,1), (wow,5), (1990,1), (xu.dm,4), (192.168.1.3,1), (dnf,4), (sword,2), (192.168.1.2,2), (hdfs,2), (blade,2), (2000,3))
     
  • 相关阅读:
    HTTP断点续传 规格严格
    Java Shutdown 规格严格
    linux 命令源码 规格严格
    JTable调整列宽 规格严格
    linux 多CPU 规格严格
    Hello can not find git path 规格严格
    Kill 规格严格
    拜拜牛人 规格严格
    Swing 规格严格
    Debugging hangs in JVM (on AIX but methodology applicable to other platforms) 规格严格
  • 原文地址:https://www.cnblogs.com/asker009/p/11464932.html
Copyright © 2011-2022 走看看