zoukankan      html  css  js  c++  java
  • Spark2 Dataset之collect_set与collect_list

    collect_set去除重复元素;collect_list不去除重复元素
    select gender,
           concat_ws(',', collect_set(children)),
           concat_ws(',', collect_list(children))
      from Affairs
     group by gender

    // 创建视图 
    data.createOrReplaceTempView("Affairs")
    
    val df3= spark.sql("select gender,concat_ws(',',collect_set(children)),concat_ws(',',collect_list(children)) from Affairs group by gender")
    df3: org.apache.spark.sql.DataFrame = [gender: string, concat_ws(,, collect_set(children)): string ... 1 more field]
    
    df3.show  // collect_set去除重复元素;collect_list不去除重复元素
    +------+-----------------------------------+------------------------------------+
    |gender|concat_ws(,, collect_set(children))|concat_ws(,, collect_list(children))|
    +------+-----------------------------------+------------------------------------+
    |female|                             no,yes|                    no,yes,no,no,yes|
    |  male|                             no,yes|                    no,yes,no,yes,no|
    +------+-----------------------------------+------------------------------------+
    
  • 相关阅读:
    Java08_Lambda表达式
    Java基础02
    Java基础07
    JAVA基础06
    Java基础05
    Java基础01
    面向对象与面向过程
    Java常识2
    CSS常用属性记录
    geoserver发布热力图服务
  • 原文地址:https://www.cnblogs.com/wwxbi/p/6102380.html
Copyright © 2011-2022 走看看