zoukankan      html  css  js  c++  java
  • Hive之分组取前几条记录

     取每个用户最喜欢购买的前三个product

    select *
    FROM
      (select user_id,
              product_id,
              row_number() over(partition BY user_id
                                ORDER BY top_cnt DESC) AS row_num
       FROM
         (select ord.user_id AS user_id,
                 pri.product_id AS product_id,
                 count(1) over(partition BY user_id,product_id) AS top_cnt
          FROM orders ord
          JOIN priors pri ON ord.order_id=pri.order_id) new_t) new_t2
    WHERE row_num<4 LIMIT 10;

    结果输出

    new_t2.user_id    new_t2.product_id    new_t2.top_cnt    new_t2.row_num
    100001    21137    41    1
    100001    13176    41    2
    100001    13176    41    3
    100005    21709    10    1
    100005    21709    10    2
    100005    21709    10    3
    100009    24852    8    1
    100009    24852    8    2
    100009    24852    8    3
    100012    41148    24    1

    把每个用户的前三条记录输出为一行, 列转行

    select user_id,collect_list(product_id) as collect_list
    FROM
      (select user_id,
              product_id,
              row_number() over(partition BY user_id
                                ORDER BY top_cnt DESC) AS row_num
       FROM
         (select ord.user_id AS user_id,
                 pri.product_id AS product_id,
                 count(1) over(partition BY user_id,product_id) AS top_cnt
          FROM orders ord
          JOIN priors pri ON ord.order_id=pri.order_id) new_t) new_t2
    WHERE row_num<4 
    group by user_id
    LIMIT 10;

    结果如下

    user_id    collect_list
    1         ["12427","12427","12427"]
    10        ["47526","47526","47526"]
    100       ["27344","27344","27344"]
    1000      ["49683","49683","49683"]
    10000     ["21137","21137","21137"]
    100000    ["10151","10151","10151"]
    100001    ["21137","13176","13176"]
    100002    ["26172","26172","26172"]
    100003    ["47598","17616","17616"]
    100004    ["19660","19660","19660"]

    参考:

    https://blog.csdn.net/doveyoung8/article/details/80022579

  • 相关阅读:
    HTTP和HTTPS的区别
    python计算机二级考试知识点——文件操作
    python二级考试知识点——turtle、random、time、PyInstaller、jieba、wordcloud
    淘宝搜索功能的测试
    百度搜索测试用例
    微信朋友圈测试用例
    微信聊天功能测试用例
    微信红包测试用例
    微信点赞功能测试用例
    SQL Server 远程共享文件夹备份
  • 原文地址:https://www.cnblogs.com/blogyuhan/p/9290473.html
Copyright © 2011-2022 走看看