zoukankan      html  css  js  c++  java
  • yhd日志分析(二)

    yhd日志分析(二)

    继续yhd日志分析,统计数据

    日期uvpv登录人数游客人数平均访问时长二跳率独立ip数

    1 分析

    登录人数

    count(distinct endUserId)

    游客人数

    count(distinct guid) - count(distinct endUserId)

    平均访问时长

    先把tracktime转换为unix timestamp, 相同sessionId的tracktime中,max(tracktime)-min(tracktime),得到用户停留时间。所有用户的停留时间相加,得到总停留时间。总停留时间和总访问次数的比例,就是平均访问时长

    select sum(stay_time) as total_stay_time
    from (select max(to_unix_timestamp(trackTime)) - min(to_unix_timestamp(trackTime)) as stay_time from yhd_log group by sessionId) stay
    

    用户访问总数

    count(distinct sessionId)

    二跳率

    sessionViewNo=2的用户,即为二跳用户。统计出二跳用户和uv的比例

    select count(distinct guid) from yhd_log where sessionViewNo=2

    独立ip数

    count(distinct ip)

    实现

    1. 借助中间表,分别存放停留时间和二次跳用户总数

       // 存放总停留时间
       
       create table if not exists yhd_log_total_stay_time(
       	date string,
       	total_stay_time bigint
       )
       row format delimited fields terminated by '	'
       stored as textfile;
       
       // 存放二次跳用户总数
       
       create table if not exists yhd_log_total_second_jump(
       	date string,
       	total_second_jump bigint
       )
       row format delimited fields terminated by '	'
       stored as textfile;
      
    2. 计算总停留时间,存放在yhd_log_total_stay_time, 按日期分组

       insert overwrite table yhd_log_total_stay_time
       select date, sum(stay_time) as total_stay_time
       from (select max(to_unix_timestamp(trackTime)) - min(to_unix_timestamp(trackTime)) as stay_time, date from yhd_log group by date, sessionId) stay
       group by date
      
    3. 计算二次跳用户总数,存放在yhd_log_total_second_jump, 按日期分组

       insert overwrite table yhd_log_total_second_jump
       select date, count(distinct guid)
       from yhd_log
       where sessionViewNo=2 
       group by date
      
    4. 统计

       把yhd_log_total_stay_time,yhd_log_total_second_jump和yhd_log按照 date连接查询
      
       select date, pv, uv, user_count, guest_count, 
       	total_stay_time/total_visit as average_stay_time,
       	total_second_jump/ uv as second_jump_rate, indepent_ip
       from (
       select log.date, 
       	count(url) as pv, 
       	count(distinct guid) as uv,
       	count(distinct endUserId) as user_count,
       	count(distinct guid) - count(distinct endUserId) as guest_count,
       	count(distinct sessionId) as total_visit,
       	min(stay.total_stay_time) as total_stay_time,
       	min(second.total_second_jump) as total_second_jump,
       	count(distinct ip) as indepent_ip
       from yhd_log log inner join yhd_log_total_stay_time stay on stay.date=log.date inner join yhd_log_total_second_jump second on second.date=log.date
       group by log.date
       ) stat
      

    结果

    datepvuvuser_countguest_countaverage_stay_timesecond_jump_rateindepent_ip
    20150828126134390071768721320745.97973932447510.1311815827928320730462
  • 相关阅读:
    左耳听风-ARTS-第4周(2019/4/21-2019/4/27)
    Java集合总结
    Zuul网关总结
    左耳听风-ARTS-第3周(2019/4/7-2019/4/13)
    左耳听风-ARTS-第2周(2019/3/31-2019/4/6)
    Java泛型相关总结(下)
    左耳听风-ARTS-第1周
    去长江边走走,看看
    第1记
    c#发送邮件
  • 原文地址:https://www.cnblogs.com/ivanny/p/yhd_log_statatis_2.html
Copyright © 2011-2022 走看看