zoukankan      html  css  js  c++  java
  • hive之案例分析(grouping sets,lateral view explode, concat_ws)

    有这样一组搜索结果数据:

    租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List

    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}

    现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

     这里最关键的是sku对应到命中的关键词:

    操作步骤1: 

    将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。

        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221';

    显示如下:

    操作步骤2:

    根据商品,汇总关键词列,这里考虑到平台,时间维度等。

    grouping sets 分组汇总数据

    collect_set 多行合并并且去重

    collect_list 多行合并不去重

    with tmp_a as (
        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221'
    )
    
    select tenantcode, 
        nvl(platform,'all') as platform,
        skucode,
        dim_code,
        dim_value,
        count(skuname) as search_times, 
        collect_set(keywords) as keywords
    from tmp_a 
    group by tenantcode,platform,skucode,dim_code,dim_value
    grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

    操作步骤3:

    数组转字符串: concat_ws('分隔符',数组)

    with tmp_a as (
        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221'
    ),
    tmp_b as (
        select tenantcode, 
            nvl(platform,'all') as platform,
            skucode,
            dim_code,
            dim_value,
            count(skuname) as search_times, 
            concat_ws(',',collect_set(keywords)) as keywords
        from tmp_a 
        group by tenantcode,platform,skucode,dim_code,dim_value
        grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
    )
    select * from tmp_b;

    是不是太简单了。

  • 相关阅读:
    编写高质量代码 :Web前端开发修炼之道(读书笔记)
    Cordova开发学习笔记
    C++面试总结
    深信服社招Linux岗位一面记录
    一次软件开发不问技术的面试
    面试总结2
    剑指offer 学习笔记(一):二维数组中的查找
    关于思维习惯问题
    ajax的使用(使用json格式提交给服务端)
    es6-13 类与对象
  • 原文地址:https://www.cnblogs.com/30go/p/10169319.html
Copyright © 2011-2022 走看看