zoukankan      html  css  js  c++  java
  • DataFrame WordCount

    测试数据:

    1 2 3 4 5 6
    3 4 5 6 7 10
    10 1 2 3 4 5
    9 8 7 6 5 4
    **
      * 使用DataFrame实现WordCount
      */
    object DataFrameWordCount {
      def main(args: Array[String]): Unit = {
    
        val spark = SparkSession.builder().appName(this.getClass.getSimpleName).master("local").getOrCreate()
        import spark.implicits._
        val linesDF = spark.sparkContext.textFile("D:\workspace\test_data.txt").toDF("line")
        linesDF.show(false)
        linesDF.printSchema()
        //将一行数据展开
        val wordsDF = linesDF.explode("line", "word")((line: String) => line.split(" "))
        wordsDF.printSchema()
        wordsDF.show(200,false)
        //对 "word"列进行聚合逻辑并使用count算子计算每个分组元素的个数
        val wordCoungDF = wordsDF.groupBy("word").count()
        wordCoungDF.show(false)
        wordCoungDF.printSchema()
        println(wordCoungDF.count() + "----------")
      }
    
    }

    打印结果:

    +------------+
    |line        |
    +------------+
    |1 2 3 4 5 6 |
    |3 4 5 6 7 10|
    |10 1 2 3 4 5|
    |9 8 7 6 5 4 |
    +------------+
    
    root
     |-- line: string (nullable = true)
    
    root
     |-- line: string (nullable = true)
     |-- word: string (nullable = true)
    
    +------------+----+
    |line        |word|
    +------------+----+
    |1 2 3 4 5 6 |1   |
    |1 2 3 4 5 6 |2   |
    |1 2 3 4 5 6 |3   |
    |1 2 3 4 5 6 |4   |
    |1 2 3 4 5 6 |5   |
    |1 2 3 4 5 6 |6   |
    |3 4 5 6 7 10|3   |
    |3 4 5 6 7 10|4   |
    |3 4 5 6 7 10|5   |
    |3 4 5 6 7 10|6   |
    |3 4 5 6 7 10|7   |
    |3 4 5 6 7 10|10  |
    |10 1 2 3 4 5|10  |
    |10 1 2 3 4 5|1   |
    |10 1 2 3 4 5|2   |
    |10 1 2 3 4 5|3   |
    |10 1 2 3 4 5|4   |
    |10 1 2 3 4 5|5   |
    |9 8 7 6 5 4 |9   |
    |9 8 7 6 5 4 |8   |
    |9 8 7 6 5 4 |7   |
    |9 8 7 6 5 4 |6   |
    |9 8 7 6 5 4 |5   |
    |9 8 7 6 5 4 |4   |
    +------------+----+
    
    +----+-----+
    |word|count|
    +----+-----+
    |7   |2    |
    |3   |3    |
    |8   |1    |
    |5   |4    |
    |6   |3    |
    |9   |1    |
    |1   |2    |
    |10  |2    |
    |4   |4    |
    |2   |2    |
    +----+-----+
    
    root
     |-- word: string (nullable = true)
     |-- count: long (nullable = false)
    
    10
  • 相关阅读:
    火币Huobi API Websocket
    火币Huobi API
    OKEX API(Websocket)
    OKEX API
    Linux下Miniconda量化环境安装
    Numba:高性能Python编译器
    十进制和十六进制互相转换
    JavaScript 原型和原型链
    Redux 进阶之 react-redux 和 redux-thunk 的应用
    Vue 中 $nextTick() 的应用
  • 原文地址:https://www.cnblogs.com/itboys/p/10093589.html
Copyright © 2011-2022 走看看