zoukankan      html  css  js  c++  java
  • hive之案例分析(grouping sets,lateral view explode, concat_ws)

    有这样一组搜索结果数据:

    租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List

    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}

    现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

     这里最关键的是sku对应到命中的关键词:

    操作步骤1: 

    将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。

        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221';

    显示如下:

    操作步骤2:

    根据商品,汇总关键词列,这里考虑到平台,时间维度等。

    grouping sets 分组汇总数据

    collect_set 多行合并并且去重

    collect_list 多行合并不去重

    with tmp_a as (
        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221'
    )
    
    select tenantcode, 
        nvl(platform,'all') as platform,
        skucode,
        dim_code,
        dim_value,
        count(skuname) as search_times, 
        collect_set(keywords) as keywords
    from tmp_a 
    group by tenantcode,platform,skucode,dim_code,dim_value
    grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

    操作步骤3:

    数组转字符串: concat_ws('分隔符',数组)

    with tmp_a as (
        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221'
    ),
    tmp_b as (
        select tenantcode, 
            nvl(platform,'all') as platform,
            skucode,
            dim_code,
            dim_value,
            count(skuname) as search_times, 
            concat_ws(',',collect_set(keywords)) as keywords
        from tmp_a 
        group by tenantcode,platform,skucode,dim_code,dim_value
        grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
    )
    select * from tmp_b;

    是不是太简单了。

  • 相关阅读:
    企业网络架构
    谷歌推出情境感知API
    Firebase远程更新应用
    黑盒测试
    单元测试
    代码性能分析
    代码静态检查
    PHP 使用正则匹配 去掉 URL 链接第三个斜杠后面的部分
    JQ 全选 反选 取消全选的方法
    织梦导航栏有特定样式用法
  • 原文地址:https://www.cnblogs.com/30go/p/10169319.html
Copyright © 2011-2022 走看看