zoukankan      html  css  js  c++  java
  • hive实现根据用户分组,按用户记录求上下两条记录的时间差

    在mysql,数据如下:
    #查询某一用户该日抽奖时间 select draw_time from user_draw_log where user_id = 1 and draw_date='2016-03-09' order by id; +---------------------+ | draw_time | +---------------------+ | 2016-03-09 13:52:46 | | 2016-03-09 13:52:53 | | 2016-03-09 13:53:01 | | 2016-03-09 13:53:13 | | 2016-03-09 13:53:25 | ...

    想计算每次抽奖时间之间的间隔 以便判断是否是并发插入 我的方法如下使用一个临时变量记录前一次的抽奖时间

    select draw_time, timediff(draw_time,@prev_time) diff,(@prev_time:=draw_time) from user_draw_log where user_id = 1 and draw_date='2016-03-09' order by id;
    +---------------------+------------------+-------------------------+
    | draw_time           | diff             | (@prev_time:=draw_time) |
    +---------------------+------------------+-------------------------+
    | 2016-03-09 13:52:46 | -00:08:28.000000 | 2016-03-09 13:52:46     |
    | 2016-03-09 13:52:53 | 00:00:07.000000  | 2016-03-09 13:52:53     |
    | 2016-03-09 13:53:01 | 00:00:08.000000  | 2016-03-09 13:53:01     |
    | 2016-03-09 13:53:13 | 00:00:12.000000  | 2016-03-09 13:53:13     |
    | 2016-03-09 13:53:25 | 00:00:12.000000  | 2016-03-09 13:53:25     |
    | 2016-03-09 13:53:32 | 00:00:07.000000  | 2016-03-09 13:53:32     |
    | 2016-03-09 13:53:38 | 00:00:06.000000  | 2016-03-09 13:53:38     |
    ...

    有没更方便的方法实现这一功能呢?对所有用户都求相邻记录时间差该如何操作?

    hive做法如下:

    1.Hive row_number() 函数的高级用法 row_num 按照某个字段分区显示第几条数据

    select imei,ts,fuel_instant,gps_longitude,gps_latitude,row_number() over (PARTITION BY imei ORDER BY ts ASC) as row_num from sample_data_2

    2.row_num 是相互连续的,join 自身,然后时间相减可求差
    create table obd_20140101 as

      select a.imei,a.row_num,a.ts,COALESCE(unix_timestamp(a.ts, 'yyyy-MM-dd HH:mm:ss.S'), 0) - unix_timestamp(b.ts, 'yyyy-MM-dd HH:mm:ss.S') as intervel ,a.fuel_instant,a.gps_speed as obd_speed,a.gps_status,a.gps_longitude,a.gps_latitude,a.direct_angle,a.obdspeed from obddata_20140101 a join obddata_20140101 b on a.imei = b.imei and a.row_num = b.row_num +1

    事实上该方法有更加简便的方法,那就是hive的分析窗口函数:

    create table obd_20140101 as

    select imei,ts as ts1,fuel_instant,gps_longitude,gps_latitude,lead(ts,1,ts) over  (PARTITION BY imei ORDER BY ts ASC)  as ts2 from sample_data_2;

    这样,数据会按imei分组,并按时间排序。接下来的时间相减就简单了。

    select a.imei,a.row_num,a.ts,COALESCE(unix_timestamp(a.ts1, 'yyyy-MM-dd HH:mm:ss.S'), 0) - unix_timestamp(a.ts2, 'yyyy-MM-dd HH:mm:ss.S') as intervel ,a.fuel_instant,a.gps_speed as obd_speed,a.gps_status,a.gps_longitude,a.gps_latitude,a.direct_angle,a.obdspeed from obddata_20140101 a;

  • 相关阅读:
    VBS发送邮件-1
    docker命令
    NLP | 自然语言处理
    windows: Python安装scipy,scikit-image时提示"no lapack/blas resources found"的解决方法
    Sense2vec with spaCy and Gensim
    python 去停用词
    nohup command > myout.file 2>&1 &
    NLTK vs SKLearn vs Gensim vs TextBlob vs spaCy
    Gensim进阶教程:训练word2vec与doc2vec模型
    Gensim入门教程
  • 原文地址:https://www.cnblogs.com/hd-zg/p/5930536.html
Copyright © 2011-2022 走看看