zoukankan      html  css  js  c++  java
  • Spark2 Dataset之collect_set与collect_list

    collect_set去除重复元素;collect_list不去除重复元素
    select gender,
           concat_ws(',', collect_set(children)),
           concat_ws(',', collect_list(children))
      from Affairs
     group by gender

    // 创建视图 
    data.createOrReplaceTempView("Affairs")
    
    val df3= spark.sql("select gender,concat_ws(',',collect_set(children)),concat_ws(',',collect_list(children)) from Affairs group by gender")
    df3: org.apache.spark.sql.DataFrame = [gender: string, concat_ws(,, collect_set(children)): string ... 1 more field]
    
    df3.show  // collect_set去除重复元素;collect_list不去除重复元素
    +------+-----------------------------------+------------------------------------+
    |gender|concat_ws(,, collect_set(children))|concat_ws(,, collect_list(children))|
    +------+-----------------------------------+------------------------------------+
    |female|                             no,yes|                    no,yes,no,no,yes|
    |  male|                             no,yes|                    no,yes,no,yes,no|
    +------+-----------------------------------+------------------------------------+
    
  • 相关阅读:
    构造方法
    方法调用时参数传递问题
    空指针异常
    Go安装,配置
    干货-MySQL
    websocket
    Tornado的使用
    socket客户端异步、socket服务端异步
    celery分布式队列实现:实时显示任务执行到哪一步
    celery+django实践
  • 原文地址:https://www.cnblogs.com/wwxbi/p/6102380.html
Copyright © 2011-2022 走看看