有这样一组搜索结果数据:
租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]} {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]} {"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
现在需要统计每个商品被哪些关键词搜索到,最终结果如下:
这里最关键的是sku对应到命中的关键词:
操作步骤1:
将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。
select tenantcode, nvl(platform,0) as platform, keywords, 'day' as dim_code, '20181221' as dim_value, gl['skucode'] as skucode, gl['skuname'] as skuname, gl['spucode'] as spucode, gl['spuname'] as spuname from dw_mdl.m_search_result2 lateral view explode(goodsList) gl as gl where dt = '20181221';
显示如下:
操作步骤2:
根据商品,汇总关键词列,这里考虑到平台,时间维度等。
grouping sets 分组汇总数据
collect_set 多行合并并且去重
collect_list 多行合并不去重
with tmp_a as ( select tenantcode, nvl(platform,0) as platform, keywords, 'day' as dim_code, '20181221' as dim_value, gl['skucode'] as skucode, gl['skuname'] as skuname, gl['spucode'] as spucode, gl['spuname'] as spuname from dw_mdl.m_search_result2 lateral view explode(goodsList) gl as gl where dt = '20181221' ) select tenantcode, nvl(platform,'all') as platform, skucode, dim_code, dim_value, count(skuname) as search_times, collect_set(keywords) as keywords from tmp_a group by tenantcode,platform,skucode,dim_code,dim_value grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
操作步骤3:
数组转字符串: concat_ws('分隔符',数组)
with tmp_a as ( select tenantcode, nvl(platform,0) as platform, keywords, 'day' as dim_code, '20181221' as dim_value, gl['skucode'] as skucode, gl['skuname'] as skuname, gl['spucode'] as spucode, gl['spuname'] as spuname from dw_mdl.m_search_result2 lateral view explode(goodsList) gl as gl where dt = '20181221' ), tmp_b as ( select tenantcode, nvl(platform,'all') as platform, skucode, dim_code, dim_value, count(skuname) as search_times, concat_ws(',',collect_set(keywords)) as keywords from tmp_a group by tenantcode,platform,skucode,dim_code,dim_value grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value)) ) select * from tmp_b;
是不是太简单了。