zoukankan      html  css  js  c++  java
  • Spark 各种示例

    1. spark 去重  (将每一行数据做为key来分组,这样就进行了去重,然后再取出key就可以了)

    原数据:
    2012-3-1 a
    
    2012-3-2 b
    
    2012-3-3 c
    2012-3-2 b
    实现源码:
     rdd.filter(_.trim().length() > 0).map(line => (line.trim(), "")).groupByKey().sortByKey(true).keys.foreach(println)

    2. 数据清洗(过滤)

    原数据:
    https://blog.csdn.net/weixin_42540606/article/details/81100882
    
    http://192.168.20.111:8080/
    
    https://www.cnblogs.com/redhat0019/p/8665491.html
    
    http://192.168.20.111:50070/dfshealth.html#tab-overview
    
    http://192.168.20.124:1082/osgiWeb/page/hgu/index.jsp

    实现代码:

    two.filter(_.trim().length>0).map(line=>(line.trim,"")).groupByKey().sortByKey().keys.collect.foreach(println _)

     3. 获取每年最高温度

    0067011990999991950051507004888888889999999N9+00001+9999999999999999999999
    
    0067011990999991951051512004888888889999999N9+00222+9999999999999999999999
    
    0067011990999991952051518004888888889999999N9-00111+9999999999999999999999
    
    0067011990999991953032412004888888889999999N9+99991+9999999999999999999999
    
    0067011990999991954032418004888888880500001N9+00001+9999999999999999999999
    
    0067011990999991955051507004888888880500001N9+00781+9999999999999999999999
    
    0067011990999991950051507004888888889999999N9+02341+9999999999999999999999
    
    0067011990999991951051512004888888889999999N9+04567+9999999999999999999999
    
    0067011990999991952051518004888888889999999N9-02111+9999999999999999999999
    
    0067011990999991953032412004888888889999999N9+67811+9999999999999999999999
    
    0067011990999991954032418004888888880500001N9+22211+9999999999999999999999
    
    0067011990999991955051507004888888880500001N9+00781+9999999999999999999999
    
    0067011990999991950051507004888888889999999N9+03341+9999999999999999999999
    
    0067011990999991951051512004888888889999999N9+04667+9999999999999999999999
    
    0067011990999991952051518004888888889999999N9-02211+9999999999999999999999
    
    0067011990999991953032412004888888889999999N9+27811+9999999999999999999999
    
    0067011990999991954032418004888888880500001N9+25211+9999999999999999999999
    
    0067011990999991955051507004888888880500001N9+01781+9999999999999999999999

    数据说明: 

    第15-19个字符是year

    第45-50位是温度表示,+表示零上 -表示零下,且温度的值不能是9999,9999表示异常数据

    第50位值只能是0、1、4、5、9几个数字

    实现代码:

      def main(args: Array[String]): Unit = {
            val conf = new SparkConf()
            conf.setAppName("WordCount123")
            conf.setMaster("local")
            val num = (0,1,4,5)
            val sc = new SparkContext(conf)
            val data = sc.textFile("D://wordCount.txt")
            val line = data.filter(_.trim().length() > 0).map(line => (line.substring(15,19).toInt, line.charAt(45), line.substring(46,50).toInt, line.substring(50, 51)))
            line.collect().foreach(println)
            val line2 = line.filter(line => (line._2 == '+' && line._3 != 9999 && line._4.matches("[01459]"))).map(line => (line._1, line._3))
            val line3 = line2.reduceByKey((x, y) => if (x > y) x else y)
            line3.collect().foreach(println)
        }

     4. 排序加序号

    2
    
    32
    
    654
    
    32
    
    15
    
    756
    
    65223
    5956
    val conf = new SparkConf()
            conf.setAppName("WordCount123")
            conf.setMaster("local")
            var idx = 0
            val num = (0,1,4,5)
            val sc = new SparkContext(conf)
            val data = sc.textFile("D://wordCount.txt")
            val line = data.filter(_.trim().length() > 0).map(num => (num.toInt, "")).sortByKey().map(num => {
              idx += 1
              (idx, num._1)
            }).collect().foreach(println)
  • 相关阅读:
    C# 枚举常用工具方法
    AppBox_v3.0
    DDD:四色原型中Role的 “六” 种实现方式和PHP的Swoole扩展
    MySql主从配置实践及其优势浅谈
    ActionInvoker
    【Oracle】-【体系结构】-【DBWR】-DBWR进程相关理解
    Linux MySQL单实例源码编译安装5.6
    窗口嵌入到另一个窗口(VC和QT都有)
    Window下 Qt 编译MySQL驱动(居然用到了动态库格式转换工具,需要将C:/MySQL/lib目录下的libmySQL.dll文件复制到我们Qt Creator安装目录下的qt/bin目录中)good
    在Linux下使用iconv转换字符串编码
  • 原文地址:https://www.cnblogs.com/redhat0019/p/11242603.html
Copyright © 2011-2022 走看看