zoukankan      html  css  js  c++  java
  • scala的reduce

    spark 中的 reduce 非常的好用,reduce 可以对 dataframe 中的元素进行计算、拼接等等。例如生成了一个 dataframe :

    //配置spark
      def getSparkSession(): SparkSession = {
    
        //读取配置文件
        val properties: Properties = new Properties()
        val ipstream: InputStream = this.getClass().getResourceAsStream("/config.properties")
        properties.load(ipstream)
    
        val masterUrl = properties.getProperty("spark.master.url")
        val appName = properties.getProperty("spark.app.name")
        val sparkconf = new SparkConf()
          .setMaster(masterUrl)
          .setAppName(appName)
          .set("spark.port.maxRetries", "100")
        val Spark = SparkSession.builder().config(sparkconf).getOrCreate()
        Spark
      }
    def main(args: Array[String]): Unit = {
        val spark = getSparkSession()
        val sentenceDataFrame = spark.createDataFrame(Seq(
          (0, "Hi I heard about Spark"),
          (1, "I wish Java could use case classes"),
          (2, "Logistic regression models are neat")
        )).toDF("label", "sentence")
        sentenceDataFrame.show()
      }
    

    假设要将 sentence 这一列拼接成一长串字符串,则:

    sentenceDataFrame.createOrReplaceTempView("BIGDATA")
    val sqlresult: DataFrame = spark.sql(s"SELECT sentence FROM BIGDATA")
    val a: RDD[String] = sqlresult.rdd.map(_.getAs[String]("sentence"))
    val b = a.reduce((x, y) => x + "," + y)
    

    要是将 sentence 这一列拼接一个 List,则:

    val c: RDD[List[String]] = sqlresult.rdd.map{ row=>List(row.getAs[String]("sentence"))}
    val d: List[String] = c.reduce((x, y)=>x++y)
    
  • 相关阅读:
    思蕊防静电地板
    一个老站长的22条军规
    百度天天快照知识宝典
    搜索引擎常用搜索技巧
    网站运营工作流程
    关于线程间通信
    VS2012 EF5 连接oracle11.2
    ArcSde for Oracle服务注册
    NHibernate composite-id联合主键配置
    NHibernate 的 ID 标识选择器
  • 原文地址:https://www.cnblogs.com/TTyb/p/6867494.html
Copyright © 2011-2022 走看看