yhd日志分析(二)
继续yhd日志分析,统计数据
日期 | uv | pv | 登录人数 | 游客人数 | 平均访问时长 | 二跳率 | 独立ip数 |
---|
1 分析
登录人数
count(distinct endUserId)
游客人数
count(distinct guid) - count(distinct endUserId)
平均访问时长
先把tracktime转换为unix timestamp, 相同sessionId的tracktime中,max(tracktime)-min(tracktime),得到用户停留时间。所有用户的停留时间相加,得到总停留时间。总停留时间和总访问次数的比例,就是平均访问时长
select sum(stay_time) as total_stay_time
from (select max(to_unix_timestamp(trackTime)) - min(to_unix_timestamp(trackTime)) as stay_time from yhd_log group by sessionId) stay
用户访问总数
count(distinct sessionId)
二跳率
sessionViewNo=2的用户,即为二跳用户。统计出二跳用户和uv的比例
select count(distinct guid) from yhd_log where sessionViewNo=2
独立ip数
count(distinct ip)
实现
-
借助中间表,分别存放停留时间和二次跳用户总数
// 存放总停留时间 create table if not exists yhd_log_total_stay_time( date string, total_stay_time bigint ) row format delimited fields terminated by ' ' stored as textfile; // 存放二次跳用户总数 create table if not exists yhd_log_total_second_jump( date string, total_second_jump bigint ) row format delimited fields terminated by ' ' stored as textfile;
-
计算总停留时间,存放在yhd_log_total_stay_time, 按日期分组
insert overwrite table yhd_log_total_stay_time select date, sum(stay_time) as total_stay_time from (select max(to_unix_timestamp(trackTime)) - min(to_unix_timestamp(trackTime)) as stay_time, date from yhd_log group by date, sessionId) stay group by date
-
计算二次跳用户总数,存放在yhd_log_total_second_jump, 按日期分组
insert overwrite table yhd_log_total_second_jump select date, count(distinct guid) from yhd_log where sessionViewNo=2 group by date
-
统计
把yhd_log_total_stay_time,yhd_log_total_second_jump和yhd_log按照 date连接查询 select date, pv, uv, user_count, guest_count, total_stay_time/total_visit as average_stay_time, total_second_jump/ uv as second_jump_rate, indepent_ip from ( select log.date, count(url) as pv, count(distinct guid) as uv, count(distinct endUserId) as user_count, count(distinct guid) - count(distinct endUserId) as guest_count, count(distinct sessionId) as total_visit, min(stay.total_stay_time) as total_stay_time, min(second.total_second_jump) as total_second_jump, count(distinct ip) as indepent_ip from yhd_log log inner join yhd_log_total_stay_time stay on stay.date=log.date inner join yhd_log_total_second_jump second on second.date=log.date group by log.date ) stat
结果
date | pv | uv | user_count | guest_count | average_stay_time | second_jump_rate | indepent_ip |
---|---|---|---|---|---|---|---|
20150828 | 126134 | 39007 | 17687 | 21320 | 745.9797393244751 | 0.13118158279283207 | 30462 |