zoukankan      html  css  js  c++  java
  • hive之案例分析(grouping sets,lateral view explode, concat_ws)

    有这样一组搜索结果数据:

    租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List

    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}

    现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

     这里最关键的是sku对应到命中的关键词:

    操作步骤1: 

    将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。

        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221';

    显示如下:

    操作步骤2:

    根据商品,汇总关键词列,这里考虑到平台,时间维度等。

    grouping sets 分组汇总数据

    collect_set 多行合并并且去重

    collect_list 多行合并不去重

    with tmp_a as (
        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221'
    )
    
    select tenantcode, 
        nvl(platform,'all') as platform,
        skucode,
        dim_code,
        dim_value,
        count(skuname) as search_times, 
        collect_set(keywords) as keywords
    from tmp_a 
    group by tenantcode,platform,skucode,dim_code,dim_value
    grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

    操作步骤3:

    数组转字符串: concat_ws('分隔符',数组)

    with tmp_a as (
        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221'
    ),
    tmp_b as (
        select tenantcode, 
            nvl(platform,'all') as platform,
            skucode,
            dim_code,
            dim_value,
            count(skuname) as search_times, 
            concat_ws(',',collect_set(keywords)) as keywords
        from tmp_a 
        group by tenantcode,platform,skucode,dim_code,dim_value
        grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
    )
    select * from tmp_b;

    是不是太简单了。

  • 相关阅读:
    使用vim + cscope/ctags
    python类库32[序列化和反序列化之pickle]
    Perforce2012新特征=20个用户免费+云
    Linux进程的uid和euid
    perl安装模块到自己的home ( install perl module without root)
    Python分布式+云计算
    Linux命令xargs+cut
    python实例32[简单的HttpServer]
    Python转载[编码规范]
    Linux命令lsof
  • 原文地址:https://www.cnblogs.com/30go/p/10169319.html
Copyright © 2011-2022 走看看