一个用户访问页面,页面可以返回继续访问(如A-B-C-A-B-A),求每个用户访问一个页面的最大连续天数,
user_id url date
001 A 2020-07-01 00:09:01
001 B 2020-07-01 00:10:01
001 B 2020-07-01 00:10:22
001 A 2020-07-01 00:11:22
001 B 2020-07-01 00:12:20
001 A 2020-07-01 00:13:02
002 ................................
..
select e.user_id, e.url, max(ss)
from (select d.user_id, d.url, d.cn, count(1) ss
from (select c.*, c.dt - rn cn
from (select b.*,
row_number() over(partition by b.user_id, b.url order by dt) rn
from (select distinct a.user_id,
a.url,
trunc(to_date(dt,
'yyyy-mm-dd hh24:mi:ss')) dt
from urlrecords a) b) c) d
group by d.user_id, d.url, d.cn) e
group by e.user_id, e.url
网页日志记录表 urlrecord
create table urlrecord
(
id string,
url string,
action string,
)
partitioned by(dt string);
样例数据:
id url action dt
1 www.sohu.com view 20201001
2 www.baidu.com click 20201001
3 www.baidu.com click 20201001
4 www.bilibili.com click 20201001
5 www.sohu.com click 20201002
6 www.qq.com click 20201002
.. .. .. ..
题目: 求该表中每天的url被点击(action='click')次数最多的top 5
输出数据格式 dt,url,pv(点击总数),rank(排名)
注意,如果有排名相同的,rank 排名取相同的值,后一位递增,比如 1 2 3 3 5 ,或者 1 2 3 4 5 5 等
结果数据示例:
dt url pv rank
20201001 www.58.com 3641 1
20201001 www.qq.com 3637 2
20201001 www.douban.com 3634 3
20201001 www.jd.com 3606 4
20201001 www.baidu.com 3574 5
20201001 www.youku.com 3574 5
20201002 www.jd.com 3643 1
select b.*, rank() over(partition by b.url, b.dt order by pv desc) rank
from (select a.dt, a.url, count(1) pv
from urlrecord a
where a.action = 'click'
group by a.url, a.dt) b