zoukankan      html  css  js  c++  java
  • Spark应用程序开发流程

    配置文件:

    pom.xml

      <properties>
        <scala.version>2.11.8</scala.version>
        <spark.version>2.2.0</spark.version>
        <hadoop.version>2.6.0-cdh5.7.0</hadoop.version>
      </properties>
    
      <repositories>
        <!--添加cloudera仓库依赖, CDH版本是cloudera仓库下的-->
        <repository>
          <id>cloudera</id>
          <name>cloudera</name>
          <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
      </repositories>
      
        <dependencies>
    
        <!--添加scala依赖-->
        <dependency>
          <groupId>org.scala-lang</groupId>
          <artifactId>scala-library</artifactId>
          <version>${scala.version}</version>
        </dependency>
    
        <!--添加spark-code的依赖,scala版本2.11-->
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-core_2.11</artifactId>
          <version>${spark.version}</version>
        </dependency>
    
        <!--添加hadoop-client的依赖-->
        <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-client</artifactId>
          <version>${hadoop.version}</version>
        </dependency>
    
      </dependencies>
    



    测试代码:

    传入参数:
    屏幕快照 2019-05-07 19.34.31

    WordCountApp.scala

    package com.ruozedata
    
    import org.apache.spark.{SparkConf, SparkContext}
    
    object WordCountApp extends App {
    
      val conf = new SparkConf()
      val sc = new SparkContext(conf)
    
      //输入(用args()传入参数,非硬编码)
      val dataFile = sc.textFile(args(0))
    
      //业务逻辑
      val outputFile = dataFile.flatMap(_.split(",")).map((_,1)).reduceByKey(_+_)
    
      //输出文件
      outputFile.saveAsTextFile(args(1))
    
      //关闭流(输入流)
      sc.stop()
    }
    



    CLI中测试:

    屏幕快照 2019-05-07 19.48.15



    打包提交到服务器并执行:

    屏幕快照 2019-05-07 18.53.47
    屏幕快照 2019-05-07 18.39.57
    屏幕快照 2019-05-07 18.37.34
    屏幕快照 2019-05-07 18.29.52
    屏幕快照 2019-05-07 18.19.31
    屏幕快照 2019-05-07 18.21.00
    屏幕快照 2019-05-07 18.38.28
    屏幕快照 2019-05-07 19.39.07



    Linux下本地模式提交到服务器: (在脚本中配置)

    $ /home/hadoop/app/spark/bin/spark-submit 
    --class com.ruozedata.WordCountApp 
    --master local[2] 
    --name WordCountApp 
    /home/hadoop/lib/spark/SparkCodeApp-1.0.jar 
    /wc_input/ /wc_output
    

    具体配置参考Spark官网:

    http://spark.apache.org/docs/2.2.0/rdd-programming-guide.html
    http://spark.apache.org/docs/2.2.0/configuration.html
    http://spark.apache.org/docs/2.2.0/submitting-applications.html

  • 相关阅读:
    MongoDB 3.0安全权限访问控制(Windows版)
    MVC创建通用DropdownList
    当文字超出宽度,自动截取文字并加省略号
    JQuery Datatables(二)
    JQuery Datatables(一)
    PHP signal 信号
    phpunit 入门
    Wget 命令详解
    ubuntn下 apt的用法和yum的比较(转)
    navicat for mysql 安装
  • 原文地址:https://www.cnblogs.com/suixingc/p/spark-ying-yong-cheng-xu-kai-fa-liu-cheng.html
Copyright © 2011-2022 走看看