zoukankan      html  css  js  c++  java
  • Hive 求出连续三天有销售记录的店铺

    原始数据

    A,2017-10-11,300
    A,2017-10-12,200
    A,2017-10-13,100
    A,2017-10-15,100
    A,2017-10-16,300
    A,2017-10-17,150
    A,2017-10-18,340
    A,2017-10-19,360
    B,2017-10-11,400
    B,2017-10-12,200
    B,2017-10-15,600
    C,2017-10-11,350
    C,2017-10-13,250
    C,2017-10-14,300
    C,2017-10-15,400
    C,2017-10-16,200
    D,2017-10-13,500
    E,2017-10-14,600
    E,2017-10-15,500
    D,2017-10-14,600
    

    求出连续三天有销售记录的店铺,给每个用户一个编号,用日期减去编号,如果是同一天,那么就是连续的 。

    A,2017-10-11,300,1,2017-10-10
    A,2017-10-12,200,2,2017-10-10
    A,2017-10-13,100,3,2017-10-10
    A,2017-10-15,100,4,2017-10-11
    A,2017-10-16,300,5,2017-10-11
    A,2017-10-17,150,6,2017-10-11
    A,2017-10-18,340,7,2017-10-11
    A,2017-10-19,360,8,2017-10-11
    B,2017-10-11,400
    B,2017-10-12,200
    B,2017-10-15,600
    C,2017-10-11,350
    C,2017-10-13,250
    C,2017-10-14,300
    C,2017-10-15,400
    C,2017-10-16,200
    D,2017-10-13,500
    E,2017-10-14,600
    E,2017-10-15,500
    D,2017-10-14,600
    
    

    step1 建表 并加载数据

    hive (default)> drop table if exists t_jd;
    OK
    Time taken: 8.41 seconds
    hive (default)> create table t_jd(shopid string,dt string,sale int)
                  > row format delimited fields terminated by ',';
    OK
    Time taken: 0.119 seconds
    hive (default)> load data local inpath '/opt/module/data/sales.txt' into table t_jd;
    Loading data to table default.t_jd
    Table default.t_jd stats: [numFiles=1, totalSize=363]
    OK
    Time taken: 1.811 seconds
    

    下面在hive中执行的代码 省略了日志

    step2 每一条记录按照店铺id分区 按照日期全局排序 row_number()编号

    hive (default)> select shopid,dt,sale,
                  > row_number() over(partition by shopid order by dt) as rn
                  > from t_jd;
    shopid  dt      sale    rn
    A       2017-10-11      300     1
    A       2017-10-12      200     2
    A       2017-10-13      100     3
    A       2017-10-15      100     4
    A       2017-10-16      300     5
    A       2017-10-17      150     6
    A       2017-10-18      340     7
    A       2017-10-19      360     8
    B       2017-10-11      400     1
    B       2017-10-12      200     2
    B       2017-10-15      600     3
    C       2017-10-11      350     1
    C       2017-10-13      250     2
    C       2017-10-14      300     3
    C       2017-10-15      400     4
    C       2017-10-16      200     5
    D       2017-10-13      500     1
    D       2017-10-14      600     2
    E       2017-10-14      600     1
    E       2017-10-15      500     2
    
    

    step3 根据编号 生成连续日期

    hive (default)> select shopid,dt,sale,rn,
                  > date_sub(to_date(dt),rn)
                  > from
                  > (select shopid,dt,sale,
                  > row_number() over(partition by shopid order by dt) as rn
                  > from t_jd) tmp;
    
    shopid  dt      sale    rn      _c4
    A       2017-10-11      300     1       2017-10-10
    A       2017-10-12      200     2       2017-10-10
    A       2017-10-13      100     3       2017-10-10
    A       2017-10-15      100     4       2017-10-11
    A       2017-10-16      300     5       2017-10-11
    A       2017-10-17      150     6       2017-10-11
    A       2017-10-18      340     7       2017-10-11
    A       2017-10-19      360     8       2017-10-11
    B       2017-10-11      400     1       2017-10-10
    B       2017-10-12      200     2       2017-10-10
    B       2017-10-15      600     3       2017-10-12
    C       2017-10-11      350     1       2017-10-10
    C       2017-10-13      250     2       2017-10-11
    C       2017-10-14      300     3       2017-10-11
    C       2017-10-15      400     4       2017-10-11
    C       2017-10-16      200     5       2017-10-11
    D       2017-10-13      500     1       2017-10-12
    D       2017-10-14      600     2       2017-10-12
    E       2017-10-14      600     1       2017-10-13
    E       2017-10-15      500     2       2017-10-13
    

    step4 分组 求count

    hive (default)> select shopid,count(1) as cnt
                  > from
                  > (select shopid,dt,sale,rn,
                  > date_sub(to_date(dt),rn) as flag
                  > from
                  > (select shopid,dt,sale,
                  > row_number() over(partition by shopid order by dt) as rn
                  > from t_jd) tmp) tmp2
                  > group by shopid,flag;
    
    shopid  cnt
    A       3
    A       5
    B       2
    B       1
    C       1
    C       4
    D       2
    E       2
    

    step5 筛选出连续天数大于等于3的

    hive (default)> select shopid from
                  > (select shopid,count(1) as cnt
                  > from
                  > (select shopid,dt,sale,rn,
                  > date_sub(to_date(dt),rn) as flag
                  > from
                  > (select shopid,dt,sale,
                  > row_number() over(partition by shopid order by dt) as rn
                  > from t_jd) tmp) tmp2
                  > group by shopid,flag) tmp3
                  > where tmp3.cnt>=3;
    
    shopid
    A
    A
    C
    

    step6 去重

    hive (default)> select distinct shopid from
                  > (select shopid,count(1) as cnt
                  > from
                  > (select shopid,dt,sale,rn,
                  > date_sub(to_date(dt),rn) as flag
                  > from
                  > (select shopid,dt,sale,
                  > row_number() over(partition by shopid order by dt) as rn
                  > from t_jd) tmp) tmp2
                  > group by shopid,flag) tmp3
                  > where tmp3.cnt>=3;
    
    shopid
    A
    C
    
  • 相关阅读:
    安装SQLServer2000提示“无法验证产品密钥”的解决方法
    【转】SQL Server数据库开发的二十一条军规
    Asp.Net实现FORM认证的一些使用技巧
    记住window.open的用法
    VS2005的关于母版页嵌套的一个小技巧
    一种写法的区别
    问个关于VS使用上的问题
    Server.MapPath方法的应用方法
    一个关于重定向的问题研究,应该具有实用性
    象WORD一样,双击.doc的文件就自动打开WORD并编辑该文件(转)
  • 原文地址:https://www.cnblogs.com/eugene0/p/13302958.html
Copyright © 2011-2022 走看看