zoukankan      html  css  js  c++  java
  • hive排序后collect_set

    假设存在表格如下:

    select 'a' as category, 19 as duration
    union all
    select 'b' as category, 15 as duration
    union all
    select 'c' as category, 12 as duration
    union all
    select 'd' as category, 53 as duration
    union all
    select 'e' as category, 27 as duration
    union all
    select 'f' as category, 9  as duration;
    
     category | duration 
     b        |       15 
     f        |       9 
     e        |       27 
     c        |       12 
     d        |       53 
     a        |       19 
    

    想要多行转一行并且按照duration排序,形成如下效果d,e,a,b,c,f

    首先排序:row_number() over (partition by category order by cast(duration as int) desc) duration_rank,然后拼接concat_ws(',',collect_set(category)),但是得到的结果却是乱序的,产生这个问题的根本原因自然在MapReduce,如果启动了多于一个mapper/reducer来处理数据,select出来的数据顺序就几乎肯定与原始顺序不同了。

    解决方法可以把mapper数固定成1,或者把rank加进来再进行一次排序,拼接完之后把rank去掉:

    select 
    regexp_replace(    
     concat_ws(',',
       sort_array(
         collect_list(
           concat_ws(':',lpad(cast(duration_rank as string),5,'0'),cast(category as string))
         )
       )
     ),
    '\d+:','')
    from 
    (select 
    category
    ,row_number() over (order by cast(duration as int) desc) duration_rank 
    from 
    (select 'a' as category, 19 as duration
    union all
    select 'b' as category, 15 as duration
    union all
    select 'c' as category, 12 as duration
    union all
    select 'd' as category, 53 as duration
    union all
    select 'e' as category, 27 as duration
    union all
    select 'f' as category, 9 as duration) t
    ) T;
    

    duration_rank 必须要在高位补足够的0对齐,因为排序的是字符串而不是数字,如果不补0的话,按字典序排序就会变成1, 10, 11, 12, 13, 2, 3, 4...,又不对了。将排序的结果拼起来之后,用regexp_replace函数替换掉冒号及其前面的数字,大功告成。

  • 相关阅读:
    javascript UniqueID属性
    java中接口的定义与实现
    HPUX平台经常使用命令列举
    Vim简明教程【CoolShell】
    ztree使用系列三(ztree与springmvc+spring+mybatis整合实现增删改查)
    void及void指针含义的深刻解析
    IE无法打开internet网站已终止操作的解决的方法
    Ubuntu下安装eclipse
    codeforces 444 C. DZY Loves Colors(线段树)
    Surface、SurfaceView、SurfaceHolder及SurfaceHolder.Callback之间的关系
  • 原文地址:https://www.cnblogs.com/TTyb/p/12917214.html
Copyright © 2011-2022 走看看