zoukankan      html  css  js  c++  java
  • spark本地环境的搭建到运行第一个spark程序

    搭建spark本地环境

    搭建Java环境

    (1)到官网下载JDK

    官网链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

    (2)解压缩到指定的目录

    >sudo tar -zxvf jdk-8u91-linux-x64.tar.gz -C /usr/lib/jdk //版本号视自己安装的而定

    (3)设置路径和环境变量

    >sudo vim /etc/profile

    在文件的最后加上

    export JAVA_HOME=/usr/lib/jdk/jdk1.8.0_91   
    export JRE_HOME=${JAVA_HOME}/jre  
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib  
    export PATH=${JAVA_HOME}/bin:$PATH

    (4)让配置生效

    source /etc/profile

    (5)验证安装是否成功

    ~$ java -version
    java version "1.8.0_181"
    Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

    安装Scala

    (1)到官网下载安装包

    官网链接:https://www.scala-lang.org/download/

    (2)解压缩到指定目录

    sudo tar -zxvf scala-2.11.8.tgz -C /usr/lib/scala //版本号视自己安装的而定

    (3)设置路径和环境变量

    >sudo vim /etc/profile

    在文件最后加上

    export SCALA_HOME=/usr/lib/scala/scala-2.11.8  //版本号视自己安装的而定
    export PATH=${SCALA_HOME}/bin:$PATH

    (4)让配制生效

    source /etc/profile

    (5)验证安装是否成功

    :~$ scala
    Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181).
    Type in expressions for evaluation. Or try :help.
    
    scala> 

    安装Spark

    (1)到官网下载安装包

    官网链接:http://spark.apache.org/downloads.html

    (2)解压缩到指定目录

    sudo tar -zxvf spark-1.6.1-bin-hadoop2.6.tgz -C /usr/lib/spark //版本号视自己安装的而定

    (3)设置路径和环境变量

    >sudo vim /etc/profile

    在文件最后加上

    export SPARK_HOME=/usr/lib/spark/spark-1.6.1-bin-hadoop2.6
    export PATH=${SPARK_HOME}/bin:$PATH

    (4)让配置生效

    source /etc/profile

    (5)验证安装是否成功

    :~$ cd spark-1.6.1-bin-hadoop2.6
    :~/spark-1.6.1-bin-hadoop2.6$ ./bin/spark-shell
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    18/09/30 20:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/09/30 20:59:32 WARN Utils: Your hostname, pxh resolves to a loopback address: 127.0.1.1; using 10.22.48.4 instead (on interface wlan0)
    18/09/30 20:59:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    18/09/30 20:59:45 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
    Spark context Web UI available at http://10.22.48.4:4040
    Spark context available as 'sc' (master = local[*], app id = local-1538312374870).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_   version 2.2.0
          /_/
             
    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
    Type in expressions to have them evaluated.
    Type :help for more information.

    安装sbt

    (1)到官网下载安装包

    官网链接:https://www.scala-sbt.org/download.html

    (2)解压缩到指定目录

    tar -zxvf sbt-0.13.9.tgz -C /usr/local/sbt

    (3)在/usr/local/sbt 创建sbt脚本并添加以下内容

    $ cd /usr/local/sbt
    $ vim sbt
    # 在sbt文本文件中添加如下信息:
    BT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
    java $SBT_OPTS -jar /usr/local/sbt/bin/sbt-launch.jar "$@" 

    (4)保存后,为sbt脚本增加执行权限

    $ chmod u+x sbt

    (5)设置路径和环境变量

    >sudo vim /etc/profile

    在文件最后加上

    export PATH=/usr/local/sbt/:$PATH

    (6)让配置生效

    source /etc/profile

    (7)验证安装是否成功

    $ sbt sbt-version
    //如果这条命令运行不成功请改为以下这条 >sbt sbtVersion
    $ sbt sbtVersion
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
    [info] Loading project definition from /home/pxh/project
    [info] Set current project to pxh (in build file:/home/pxh/)
    [info] 1.2.1

    编写Scala应用程序

    (1)在终端创建一个文件夹sparkapp作为应用程序根目录

    cd ~
    mkdir ./sparkapp
    mkdir -p ./sparkapp/src/main/scala  #创建所需的文件夹结构

    (2)./sparkapp/src/main/scala在建立一个SimpleApp.scala的文件并添加以下代码

    import org.apache.spark.SparkContext
    import org.apache.spark.SparkContext._
    import org.apache.spark.SparkConf
    
    object SimpleApp {
        def main(args:Array[String]){
            val logFile = "file:///home/pxh/hello.ts"
            val conf = new SparkConf().setAppName("Simple Application")
            val sc = new SparkContext(conf)
            val logData = sc.textFile(logFile,2).cache()
            val numAs = logData.filter(line => line.contains("a")).count()
            println("Lines with a: %s".format(numAs))
        }
    }

    (3)添加该独立应用程序的信息以及与Spark的依赖关系

    vim ./sparkapp/simple.sbt

    在文件中添加如下内容

    name:= "Simple Project"
    version:= "1.0"
    scalaVersion :="2.11.8"
    libraryDependencies += "org.apache.spark"%% "spark-core" % "2.2.0"

    (4)检查整个应用程序的文件结构

    cd ~/sparkapp
    find .

    文件结构如下

    .
    ./simple.sbt
    ./src
    ./src/main
    ./src/main/scala
    ./src/main/scala/SimpleApp.scala

    (5)将整个应用程序打包成JAR(首次运行的话会花费较长时间下载依赖包,请耐心等待)

    sparkapp$ /usr/local/sbt/sbt package
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
    [info] Loading project definition from /home/pxh/sparkapp/project
    [info] Loading settings for project sparkapp from simple.sbt ...
    [info] Set current project to Simple Project (in build file:/home/pxh/sparkapp/)
    [success] Total time: 2 s, completed 2018-10-1 0:04:59

    (6)将生成的jar包通过spark-submit提交到Spark中运行

    :~$ /home/pxh/spark-2.2.0-bin-hadoop2.7/bin/spark-submit --class "SimpleApp" /home/pxh/sparkapp/target/scala-2.11/simple-project_2.11-1.0.jar 2>&1 | grep "Lines with a:"
    Lines with a: 3

    END........

  • 相关阅读:
    P2P理财友情提示
    P2P理财友情提示
    如何在Chrome development tool里查看C4C前台发送的请求细节
    CRM和ERP的Sales Organization的映射关系
    如何从ERP将Material的Batch信息下载到CRM并存储在settype COMM_PR_BATCH里
    CRM中间件里的发布-订阅者模式
    CRM订单状态的Open, In process和Completed这些条目是从哪里来的
    如何证明CRM WebClient UI上的应用是有状态(Stateful)的
    如何使用代码获得一个function module的Where Used List
    观察者模式在One Order回调函数中的应用
  • 原文地址:https://www.cnblogs.com/lanhuo666/p/10036243.html
Copyright © 2011-2022 走看看