zoukankan      html  css  js  c++  java
  • scala的reduce

    spark 中的 reduce 非常的好用,reduce 可以对 dataframe 中的元素进行计算、拼接等等。例如生成了一个 dataframe :

    //配置spark
      def getSparkSession(): SparkSession = {
    
        //读取配置文件
        val properties: Properties = new Properties()
        val ipstream: InputStream = this.getClass().getResourceAsStream("/config.properties")
        properties.load(ipstream)
    
        val masterUrl = properties.getProperty("spark.master.url")
        val appName = properties.getProperty("spark.app.name")
        val sparkconf = new SparkConf()
          .setMaster(masterUrl)
          .setAppName(appName)
          .set("spark.port.maxRetries", "100")
        val Spark = SparkSession.builder().config(sparkconf).getOrCreate()
        Spark
      }
    def main(args: Array[String]): Unit = {
        val spark = getSparkSession()
        val sentenceDataFrame = spark.createDataFrame(Seq(
          (0, "Hi I heard about Spark"),
          (1, "I wish Java could use case classes"),
          (2, "Logistic regression models are neat")
        )).toDF("label", "sentence")
        sentenceDataFrame.show()
      }
    

    假设要将 sentence 这一列拼接成一长串字符串,则:

    sentenceDataFrame.createOrReplaceTempView("BIGDATA")
    val sqlresult: DataFrame = spark.sql(s"SELECT sentence FROM BIGDATA")
    val a: RDD[String] = sqlresult.rdd.map(_.getAs[String]("sentence"))
    val b = a.reduce((x, y) => x + "," + y)
    

    要是将 sentence 这一列拼接一个 List,则:

    val c: RDD[List[String]] = sqlresult.rdd.map{ row=>List(row.getAs[String]("sentence"))}
    val d: List[String] = c.reduce((x, y)=>x++y)
    
  • 相关阅读:
    打印杨辉三角
    插值排序
    各种冒泡排序法
    Linux系统命令符01
    2.1博客系统 |基于form组件和Ajax实现注册登录
    python面试笔试题,你都会了吗?快来复习
    1.2博客系统 |登录页| 验证码
    1.1博客系统| 表结构
    第五章:5.2面向对象-绑定方法和非绑定方法| 内置方法 |元类
    11.Django|中间件
  • 原文地址:https://www.cnblogs.com/TTyb/p/6867494.html
Copyright © 2011-2022 走看看