zoukankan      html  css  js  c++  java
  • FOUR spark-shell 交互式编程

     编写独立应用程序实现数据去重
     
     
    目录为/usr/local/spark/mycode/remdup,在当前目录下新建一个目录
    mkdir -p src/main/scala,然后在目录/usr/local/spark/mycode/remdup/src/main/scala 下新建一个
    remdup.scala,
    import org.apache.spark.SparkContext
    import org.apache.spark.SparkContext._
    import org.apache.spark.SparkConf
    import org.apache.spark.HashPartitioner
    object RemDup {
     def main(args: Array[String]) {
     val conf = new SparkConf().setAppName("RemDup")
     val sc = new SparkContext(conf)
     val dataFile = "file:///home/charles/data"
     val data = sc.textFile(dataFile,2)
     val res = data.filter(_.trim().length>0).map(line=>(line.trim,"")).partitionBy(new 
    HashPartitioner(1)).groupByKey().sortByKey().keys
     res.saveAsTextFile("result")
     } }
    

      

    在目录/usr/local/spark/mycode/remdup 目录下新建 simple.sbt,
    name := "Simple Project"
    version := "1.0"
    scalaVersion := "2.11.8"
    libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
    

      

    在目录/usr/local/spark/mycode/remdup 下执行下面命令打包程序
    $ sudo /usr/local/sbt/sbt package
    

      

    最后在目录/usr/local/spark/mycode/remdup 下执行下面命令提交程序
    $ /usr/local/spark2.0.0/bin/spark-submit --class "RemDup" 
    /usr/local/spark2.0.0/mycode/remdup/target/scala-2.11/simple-project_2.11-1.0.jar
    

      

    在目录/usr/local/spark/mycode/remdup/result 下即可得到结果文件。
  • 相关阅读:
    select和epoll的区别
    Epoll导致的selector空轮询
    2.集合框架中的泛型有什么优点?
    java的语法基础(二)
    17-文本属性和字体属性
    15-浮动
    16-margin的用法
    14-块级元素和行内元素
    12-简单认识下margin
    day15 什么是递归/递归与回溯
  • 原文地址:https://www.cnblogs.com/NCLONG/p/12261145.html
Copyright © 2011-2022 走看看