zoukankan      html  css  js  c++  java
  • hive之案例分析(grouping sets,lateral view explode, concat_ws)

    有这样一组搜索结果数据:

    租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List

    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}

    现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

     这里最关键的是sku对应到命中的关键词:

    操作步骤1: 

    将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。

        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221';

    显示如下:

    操作步骤2:

    根据商品,汇总关键词列,这里考虑到平台,时间维度等。

    grouping sets 分组汇总数据

    collect_set 多行合并并且去重

    collect_list 多行合并不去重

    with tmp_a as (
        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221'
    )
    
    select tenantcode, 
        nvl(platform,'all') as platform,
        skucode,
        dim_code,
        dim_value,
        count(skuname) as search_times, 
        collect_set(keywords) as keywords
    from tmp_a 
    group by tenantcode,platform,skucode,dim_code,dim_value
    grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

    操作步骤3:

    数组转字符串: concat_ws('分隔符',数组)

    with tmp_a as (
        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221'
    ),
    tmp_b as (
        select tenantcode, 
            nvl(platform,'all') as platform,
            skucode,
            dim_code,
            dim_value,
            count(skuname) as search_times, 
            concat_ws(',',collect_set(keywords)) as keywords
        from tmp_a 
        group by tenantcode,platform,skucode,dim_code,dim_value
        grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
    )
    select * from tmp_b;

    是不是太简单了。

  • 相关阅读:
    MyBatisPartA
    概念:漏洞扫描技术
    概念:防火墙技术
    概念:认证技术与访问控制
    概念:为什么要引进密钥管理技术
    概念:数字签名、传统签名和认证
    概念:简述对称密码算法和公钥密码算法的区别
    概念:单向散列函数部分知识点
    Redis单线程QPS效率高的原因,以及redis6.0升级后的变化
    Mydql数据库缓存池Buffer Pool 冷热数据分离
  • 原文地址:https://www.cnblogs.com/30go/p/10169319.html
Copyright © 2011-2022 走看看