zoukankan      html  css  js  c++  java
  • hive之案例分析(grouping sets,lateral view explode, concat_ws)

    有这样一组搜索结果数据:

    租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List

    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
    {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}

    现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

     这里最关键的是sku对应到命中的关键词:

    操作步骤1: 

    将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。

        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221';

    显示如下:

    操作步骤2:

    根据商品,汇总关键词列,这里考虑到平台,时间维度等。

    grouping sets 分组汇总数据

    collect_set 多行合并并且去重

    collect_list 多行合并不去重

    with tmp_a as (
        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221'
    )
    
    select tenantcode, 
        nvl(platform,'all') as platform,
        skucode,
        dim_code,
        dim_value,
        count(skuname) as search_times, 
        collect_set(keywords) as keywords
    from tmp_a 
    group by tenantcode,platform,skucode,dim_code,dim_value
    grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

    操作步骤3:

    数组转字符串: concat_ws('分隔符',数组)

    with tmp_a as (
        select tenantcode,
            nvl(platform,0) as platform,
            keywords,
            'day' as dim_code,
            '20181221' as dim_value,
            gl['skucode'] as skucode,
            gl['skuname'] as skuname,
            gl['spucode'] as spucode,
            gl['spuname'] as spuname 
        from dw_mdl.m_search_result2
        lateral view explode(goodsList) gl as gl
        where dt = '20181221'
    ),
    tmp_b as (
        select tenantcode, 
            nvl(platform,'all') as platform,
            skucode,
            dim_code,
            dim_value,
            count(skuname) as search_times, 
            concat_ws(',',collect_set(keywords)) as keywords
        from tmp_a 
        group by tenantcode,platform,skucode,dim_code,dim_value
        grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
    )
    select * from tmp_b;

    是不是太简单了。

  • 相关阅读:
    技巧和诀窍;在VS 2005里优化ASP.NET 2.0Web项目的Build性能(转)
    去噪:用于验证码图片识别的类续(C#代码)
    快速申请QQ号码的技巧(图文介绍)
    vs2005中调试js
    "Take the Internet Back“挂机程序(读信息挣美元)
    javascript的编写、调试
    硬盘速度和Visual Studio性能
    Java Swing的DragAndDrop机制
    Rails 的 SNS 准备
    学习,编译ffmpeg tutorial
  • 原文地址:https://www.cnblogs.com/30go/p/10169319.html
Copyright © 2011-2022 走看看