zoukankan      html  css  js  c++  java
  • Hive 求出连续三天有销售记录的店铺

    原始数据

    A,2017-10-11,300
    A,2017-10-12,200
    A,2017-10-13,100
    A,2017-10-15,100
    A,2017-10-16,300
    A,2017-10-17,150
    A,2017-10-18,340
    A,2017-10-19,360
    B,2017-10-11,400
    B,2017-10-12,200
    B,2017-10-15,600
    C,2017-10-11,350
    C,2017-10-13,250
    C,2017-10-14,300
    C,2017-10-15,400
    C,2017-10-16,200
    D,2017-10-13,500
    E,2017-10-14,600
    E,2017-10-15,500
    D,2017-10-14,600
    

    求出连续三天有销售记录的店铺,给每个用户一个编号,用日期减去编号,如果是同一天,那么就是连续的 。

    A,2017-10-11,300,1,2017-10-10
    A,2017-10-12,200,2,2017-10-10
    A,2017-10-13,100,3,2017-10-10
    A,2017-10-15,100,4,2017-10-11
    A,2017-10-16,300,5,2017-10-11
    A,2017-10-17,150,6,2017-10-11
    A,2017-10-18,340,7,2017-10-11
    A,2017-10-19,360,8,2017-10-11
    B,2017-10-11,400
    B,2017-10-12,200
    B,2017-10-15,600
    C,2017-10-11,350
    C,2017-10-13,250
    C,2017-10-14,300
    C,2017-10-15,400
    C,2017-10-16,200
    D,2017-10-13,500
    E,2017-10-14,600
    E,2017-10-15,500
    D,2017-10-14,600
    
    

    step1 建表 并加载数据

    hive (default)> drop table if exists t_jd;
    OK
    Time taken: 8.41 seconds
    hive (default)> create table t_jd(shopid string,dt string,sale int)
                  > row format delimited fields terminated by ',';
    OK
    Time taken: 0.119 seconds
    hive (default)> load data local inpath '/opt/module/data/sales.txt' into table t_jd;
    Loading data to table default.t_jd
    Table default.t_jd stats: [numFiles=1, totalSize=363]
    OK
    Time taken: 1.811 seconds
    

    下面在hive中执行的代码 省略了日志

    step2 每一条记录按照店铺id分区 按照日期全局排序 row_number()编号

    hive (default)> select shopid,dt,sale,
                  > row_number() over(partition by shopid order by dt) as rn
                  > from t_jd;
    shopid  dt      sale    rn
    A       2017-10-11      300     1
    A       2017-10-12      200     2
    A       2017-10-13      100     3
    A       2017-10-15      100     4
    A       2017-10-16      300     5
    A       2017-10-17      150     6
    A       2017-10-18      340     7
    A       2017-10-19      360     8
    B       2017-10-11      400     1
    B       2017-10-12      200     2
    B       2017-10-15      600     3
    C       2017-10-11      350     1
    C       2017-10-13      250     2
    C       2017-10-14      300     3
    C       2017-10-15      400     4
    C       2017-10-16      200     5
    D       2017-10-13      500     1
    D       2017-10-14      600     2
    E       2017-10-14      600     1
    E       2017-10-15      500     2
    
    

    step3 根据编号 生成连续日期

    hive (default)> select shopid,dt,sale,rn,
                  > date_sub(to_date(dt),rn)
                  > from
                  > (select shopid,dt,sale,
                  > row_number() over(partition by shopid order by dt) as rn
                  > from t_jd) tmp;
    
    shopid  dt      sale    rn      _c4
    A       2017-10-11      300     1       2017-10-10
    A       2017-10-12      200     2       2017-10-10
    A       2017-10-13      100     3       2017-10-10
    A       2017-10-15      100     4       2017-10-11
    A       2017-10-16      300     5       2017-10-11
    A       2017-10-17      150     6       2017-10-11
    A       2017-10-18      340     7       2017-10-11
    A       2017-10-19      360     8       2017-10-11
    B       2017-10-11      400     1       2017-10-10
    B       2017-10-12      200     2       2017-10-10
    B       2017-10-15      600     3       2017-10-12
    C       2017-10-11      350     1       2017-10-10
    C       2017-10-13      250     2       2017-10-11
    C       2017-10-14      300     3       2017-10-11
    C       2017-10-15      400     4       2017-10-11
    C       2017-10-16      200     5       2017-10-11
    D       2017-10-13      500     1       2017-10-12
    D       2017-10-14      600     2       2017-10-12
    E       2017-10-14      600     1       2017-10-13
    E       2017-10-15      500     2       2017-10-13
    

    step4 分组 求count

    hive (default)> select shopid,count(1) as cnt
                  > from
                  > (select shopid,dt,sale,rn,
                  > date_sub(to_date(dt),rn) as flag
                  > from
                  > (select shopid,dt,sale,
                  > row_number() over(partition by shopid order by dt) as rn
                  > from t_jd) tmp) tmp2
                  > group by shopid,flag;
    
    shopid  cnt
    A       3
    A       5
    B       2
    B       1
    C       1
    C       4
    D       2
    E       2
    

    step5 筛选出连续天数大于等于3的

    hive (default)> select shopid from
                  > (select shopid,count(1) as cnt
                  > from
                  > (select shopid,dt,sale,rn,
                  > date_sub(to_date(dt),rn) as flag
                  > from
                  > (select shopid,dt,sale,
                  > row_number() over(partition by shopid order by dt) as rn
                  > from t_jd) tmp) tmp2
                  > group by shopid,flag) tmp3
                  > where tmp3.cnt>=3;
    
    shopid
    A
    A
    C
    

    step6 去重

    hive (default)> select distinct shopid from
                  > (select shopid,count(1) as cnt
                  > from
                  > (select shopid,dt,sale,rn,
                  > date_sub(to_date(dt),rn) as flag
                  > from
                  > (select shopid,dt,sale,
                  > row_number() over(partition by shopid order by dt) as rn
                  > from t_jd) tmp) tmp2
                  > group by shopid,flag) tmp3
                  > where tmp3.cnt>=3;
    
    shopid
    A
    C
    
  • 相关阅读:
    LG5283 异或粽子
    LG2216 理想的正方形
    LG1484 种树
    洛谷3721 HNOI2017单旋(LCT+set+思维)
    洛谷3348 大森林 (LCT + 虚点 + 树上差分)
    CF1082E Increasing Frequency (multiset+乱搞+贪心)
    CF1082G Petya and Graph(最小割,最大权闭合子图)
    cf1082D Maximum Diameter Graph(构造+模拟+细节)
    洛谷3320 SDOI2015寻宝游戏(set+dfs序)(反向迭代器的注意事项!)
    CF613D Kingdom and its Cities(虚树+贪心)
  • 原文地址:https://www.cnblogs.com/eugene0/p/13302958.html
Copyright © 2011-2022 走看看