zoukankan      html  css  js  c++  java
  • hive 用户行为分析(活跃。启动,留存,回访,新增)的一些经典sql

    
    
    
    

     很简单的sql 用户分析语句 :只要自定义简单的udf函数 获取统计时间createdatms字段的

    使用的日历类 add方法 和simpledateformat 将long类型的 定义多个重载方法 获取返回值int类型 或者long类型 进行时间判断即可
    getdaybegin(天开始),比如2017-08-08这一天的createtime为15288888888888 获取到 152888880000(代表20170808 00:00:00)当天开始的凌晨 
    getWeekbegin,getMonthgin 同上道理
     1.过去的五周(包含本周)某个app每周的周活跃用户数
      2 注意,如果能够界定分区区间的话,务必要进行分区限定查询。
      3 20170501
      4 ym/day/hm
      5 //过去的五周,每周的活跃数
      6 select  formattime(createdatms,'yyyyMMdd',0) stdate, count(distinct deviceid) stcount from ext_startup_logs where concat(ym,day)>=formattime(getweekbegin(-4),'yyyyMMdd') and appid ='sdk34734' group by formattime(createdatms,'yyyyMMdd',0) ;
      7 2.最近的六个月(包含本月)每月的月活跃数。
      8 select  formattime(createdatms,'yyyyMM') stdate, count(distinct deviceid) stcount from ext_startup_logs where ym >= formattime(getmonthbegin(-5),'yyyyMM') and appid ='sdk34734' group by formattime(createdatms,'yyyyMM') ;
      9 3.沉默用户数
     10 3.1)查询今天沉默用户数     //某个设备 启动时间 在今天(本周、本月) 只有一次 ,后续在无启动
     11 select count(*) from (select deviceid , count(createdatms) dcount,min(createdatms) dmin from ext_startup_logswhere appid = 'sdk34734' group by deviceid having dcount = 1 and min(createdatms) > getdaybegin(-1)) t
     12 4.启动次数
     13 4.1)今天app的启动次数
     14 启动次数类似于活跃用户数,活跃用户数去重,启动次数不需要去重。
     15 select count(*) from ext_startup_logs where appid = 'sdk34734' and ym = formattime(getdaybegin(),'yyyyMM') and day = formattime(getdaybegin(),'dd');
     16 5.版本分布
     17 5.1)今天appid为34734的不同版本的活跃用户数。
     18 select appversion,count(distinct deviceid) from ext_startup_logs where appid = 'sdk34734' and ym = formattime(getdaybegin(),'yyyyMM') and day = formattime(getdaybegin(),'dd') group by appversion ;
     19 
     20 5.2)本周内每天各版本日活
     21 select formattime(createdatms,'yyyyMMdd'),appversion , count(distinct deviceid) from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(),'yyyyMMdd') group by formattime(createdatms,'yyyyMMdd') , appversion
     22 
     23 
     24 [用户构成分析]
     25 1.本周回流用户  上周未启动,本周启动了的用   必须当使用not in 子查询和后续查询都必须加入别名
     26 select
     27 distinct a.deviceid
     28 from ext_startup_logs a
     29 where a.appid = 'sdk34734' and concat(a.ym,a.day) >= formattime(getweekbegin(),'yyyyMMdd') and a.deviceid not in (
     30 select
     31 distinct t.deviceid
     32 from ext_startup_logs t
     33 where t.appid = 'sdk34734' and concat(t.ym,t.day) >= formattime(getweekbegin(-1),'yyyyMMdd') and  concat(t.ym,t.day) < formattime(getweekbegin(),'yyyyMMdd')
     34 )
     35 
     36 2.连续活跃n周  连续三周活跃  2018101 20181008 20181016  去掉重有三次就是活跃
     37 select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-2),'yyyyMMdd') group by deviceid having c = 3
     38 
     39 3.忠诚用户 连续活跃5周的
     40 select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-4),'yyyyMMdd') group by deviceid having c = 5
     41 
     42 4.连续活跃用户 连续活跃n周
     43 select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-1),'yyyyMMdd') group by deviceid having c = 2
     44 
     45 
     46 select distinct(a.deviceid) from ext_startup_logs a where  concat(a.ym,a.day) < formattime(getweekbegin(-4),'yyyyMMdd') and  deviceid not in ( select distinct(t.deviceid) from ext_startup_logs t where concat(t.ym,t.day)>=formattime(getweekbegin(-4),'yyyyMMdd'))
     47 
     48 5.近期流失用户
     49 最近2、3、4都没有启动过app.
     50 查询所有用户访问的时间的max,max不能落在
     51 //四周内流失
     52 select
     53 distinct(deviceid)
     54 from ext_startup_logs
     55 where appid='#'
     56 and concat(ym,day) >= formattime(getweekbegin(-4),'yyyyMMdd')
     57 and concat(ym,day) < formattime(getweekbegin(-3),'yyyyMMdd')
     58 and deviceid not in (
     59 select
     60 distinct(t.deviceid)
     61 from ext_startup_logs t
     62 where t.appid=''
     63 and concat(t.ym,t.day) >= formattime(getweekbegin(-3),'yyyyMMdd')
     64 
     65 )
     66 union
     67 //三周内流失
     68 select
     69 distinct(deviceid)
     70 from ext_startup_logs
     71 where appid='#'
     72 and concat(ym,day) >= formattime(getweekbegin(-3),'yyyyMMdd')
     73 and concat(ym,day) < formattime(getweekbegin(-2),'yyyyMMdd')
     74 and deviceid not in (
     75 select
     76 distinct(t.deviceid)
     77 from ext_startup_logs t
     78 where t.appid=''
     79 and concat(t.ym,t.day) >= formattime(getweekbegin(-2),'yyyyMMdd')
     80 
     81 )
     82 union
     83 //两周内流失
     84 select
     85 distinct(deviceid)
     86 from ext_startup_logs
     87 where appid='#'
     88 and concat(ym,day) >= formattime(getweekbegin(-2),'yyyyMMdd')
     89 and concat(ym,day) < formattime(getweekbegin(-1),'yyyyMMdd')
     90 and deviceid not in (
     91 select
     92 distinct(t.deviceid)
     93 from ext_startup_logs t
     94 where t.appid=''
     95 and concat(t.ym,t.day) >= formattime(getweekbegin(-1),'yyyyMMdd')
     96 )
     97 
     98 
     99 
    100 [留存分析]
    101 1.留存用户
    102 周留存用户。上周新增的用户在本周还使用的
    103 select
    104 distinct(a.deviceid)
    105 from ext_startup_logs a
    106 where a.appid = 'sdk34734'
    107 and concat(a.ym,a.day) >= formattime(getweekbegin(-1),'yyyyMMdd')
    108 and concat(a.ym,a.day) < formattime(getweekbegin(),'yyyyMMdd')
    109 and a.deviceid in (
    110 select distinct(t.deviceid)
    111 from (
    112 select tt.deviceid , min(tt.createdatms) mintime
    113 from ext_startup_logs tt
    114 where tt.appid = 'sdk34734'
    115 group by tt.deviceid having mintime >= getweekbegin(-2) and mintime < getweekbegin(-1)
    116 ) t
    117 )
    118 
    119 
    120 
    121 
    122 2.用户的新鲜度
    123 新鲜度 = 某段时间的新增用户数/某段时间的活跃的老用户数 .
    124 //今天活跃用户
    125 
    126 m = select count(distinct(t.deviceid))
    127 from ext_startup_logs where concat(ym,day) = formattime(getdaybegin(),'yyyyMMdd')  and appid = ... ;
    128 //今天新增用户
    129 n = select count(distinct(t.deviceid))
    130 from (
    131 select tt.deviceid , min(tt.createdatms) mintime
    132 from ext_startup_logs tt
    133 where tt.appid = 'sdk34734'
    134 group by tt.deviceid having mintime >= getdaybegin(0)
    135 ) t
  • 相关阅读:
    HDU-1240 Asteroids! (BFS)这里是一个三维空间,用一个6*3二维数组储存6个不同方向
    HDU-1026 Ignatius and the Princess I(BFS) 带路径的广搜
    HDU-1700 Points on Cycle
    HDU-4515 小Q系列故事——世界上最遥远的距离
    Star
    HDOJ5441(图论中的并查集)
    HDOJ5438(图的各个连通分量遍历)
    HDOJ5044(最近公共祖先)
    C++输入输出知识
    JAVAmap容器基本使用
  • 原文地址:https://www.cnblogs.com/hejunhong/p/10320978.html
Copyright © 2011-2022 走看看