zoukankan      html  css  js  c++  java
  • spark安装

    1、下载地址 http://spark.apache.org/downloads.html

    2、解压

    tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz -C /opt/module/

    3、本地模式运行第一个程

    bin/spark-submit --class org.apache.spark.examples.SparkPi --executor-memory 1G --total-executor-cores 2 ./examples/jars/spark-examples_2.11-2.4.4.jar 200
    ... ...
    19/09/05 11:13:27 INFO Executor: Running task 198.0 in stage 0.0 (TID 198)
    19/09/05 11:13:27 INFO Executor: Finished task 198.0 in stage 0.0 (TID 198). 824 bytes result sent to driver
    19/09/05 11:13:27 INFO TaskSetManager: Starting task 199.0 in stage 0.0 (TID 199, localhost, executor driver, partition 199, PROCESS_LOCAL, 7866 bytes)
    19/09/05 11:13:27 INFO TaskSetManager: Finished task 198.0 in stage 0.0 (TID 198) in 6 ms on localhost (executor driver) (199/200)
    19/09/05 11:13:27 INFO Executor: Running task 199.0 in stage 0.0 (TID 199)
    19/09/05 11:13:27 INFO Executor: Finished task 199.0 in stage 0.0 (TID 199). 781 bytes result sent to driver
    19/09/05 11:13:27 INFO TaskSetManager: Finished task 199.0 in stage 0.0 (TID 199) in 9 ms on localhost (executor driver) (200/200)
    19/09/05 11:13:27 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
    19/09/05 11:13:27 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 3.129 s
    19/09/05 11:13:27 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 3.262553 s
    Pi is roughly 3.1416157570807877
    19/09/05 11:13:27 INFO SparkUI: Stopped Spark web UI at http://vmhome10.com:4040
    19/09/05 11:13:27 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    19/09/05 11:13:27 INFO MemoryStore: MemoryStore cleared
    19/09/05 11:13:27 INFO BlockManager: BlockManager stopped
    19/09/05 11:13:27 INFO BlockManagerMaster: BlockManagerMaster stopped
    19/09/05 11:13:27 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    19/09/05 11:13:27 INFO SparkContext: Successfully stopped SparkContext
    19/09/05 11:13:27 INFO ShutdownHookManager: Shutdown hook called
    19/09/05 11:13:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-7a49f112-3630-4ef6-b4dc-1c46af32c133
    19/09/05 11:13:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-6ee58588-7298-4623-b10b-6310e628060d

    参数说明:

    ./bin/spark-submit 
    --class <main-class>
    --master <master-url> 
    --deploy-mode <deploy-mode> 
    --conf <key>=<value> 
    ... # other options
    <application-jar> 
    [application-arguments]
    参数说明:
    --master spark://vmhome10.com:7077 指定Master的地址
    --class: 你的应用的启动类 (如 org.apache.spark.examples.SparkPi)
    --deploy-mode: 是否发布你的驱动到worker节点(cluster) 或者作为一个本地客户端 (client) (default: client)*
    --conf: 任意的Spark配置属性, 格式key=value. 如果值包含空格,可以加引号“key=value” 
    application-jar: 打包好的应用jar,包含依赖. 这个URL在集群中全局可见。 比如hdfs:// 共享存储系统, 如果是 file:// path, 那么所有的节点的path都包含同样的jar
    application-arguments: 传给main()方法的参数
    --executor-memory 1G 指定每个executor可用内存为1G
    --total-executor-cores 2 指定每个executor使用的cup核数为2个

    4、进入shell编程模式

    bin/spark-shell
    19/09/05 11:42:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://vmhome10.com:4040
    Spark context available as 'sc' (master = local[*], app id = local-1567654930914).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_   version 2.4.4
          /_/
             
    Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
    Type in expressions to have them evaluated.
    Type :help for more information.

     如果启动spark shell时没有指定master地址,但是也可以正常启动spark shell和执行spark shell中的程序,其实是启动了spark的local模式,该模式仅在本机启动一个进程,没有与集群建立联系.

    带参数启动shell:

    bin/spark-shell 
    --master spark://vmhome10.com:7077 
    --executor-memory 1g 
    --total-executor-cores 2

    Spark Shell中已经默认将SparkContext类初始化为对象sc。用户代码如果需要用到,则直接应用sc即可,  sparksession  是sparksql

    在shell中执行wordcount。

    scala> sc.textFile("/home/hadoop/1.txt").flatMap(_.split(",")).map((_,1)).reduceByKey(_+_).collect
    res2: Array[(String, Int)] = Array((192.168.1.1,2), (mytest,1), (wow,5), (1990,1), (xu.dm,4), (192.168.1.3,1), (dnf,4), (sword,2), (192.168.1.2,2), (hdfs,2), (blade,2), (2000,3))
     
  • 相关阅读:
    回溯算法
    回溯算法
    回溯算法
    回溯算法思想
    贪心算法
    C#多线程操作界面控件的解决方案
    jQuery教程
    html css JavaScript网页渲染
    jQuery与JavaScript与ajax三者的区别与联系
    asp.net MVC中form提交和控制器接受form提交过来的数据
  • 原文地址:https://www.cnblogs.com/asker009/p/11464932.html
Copyright © 2011-2022 走看看