很简单的sql 用户分析语句 :只要自定义简单的udf函数 获取统计时间createdatms字段的 使用的日历类 add方法 和simpledateformat 将long类型的 定义多个重载方法 获取返回值int类型 或者long类型 进行时间判断即可 getdaybegin(天开始),比如2017-08-08这一天的createtime为15288888888888 获取到 152888880000(代表20170808 00:00:00)当天开始的凌晨 getWeekbegin,getMonthgin 同上道理 过去的五周(包含本周)某个app每周的周活跃用户数 注意,如果能够界定分区区间的话,务必要进行分区限定查询。 20170501 ym/day/hm //过去的五周,每周的活跃数 select formattime(createdatms,'yyyyMMdd',0) stdate, count(distinct deviceid) stcount from ext_startup_logs where concat(ym,day)>=formattime(getweekbegin(-4),'yyyyMMdd') and appid ='sdk34734' group by formattime(createdatms,'yyyyMMdd',0) ; 2.最近的六个月(包含本月)每月的月活跃数。 select formattime(createdatms,'yyyyMM') stdate, count(distinct deviceid) stcount from ext_startup_logs where ym >= formattime(getmonthbegin(-5),'yyyyMM') and appid ='sdk34734' group by formattime(createdatms,'yyyyMM') ; 3.沉默用户数 3.1)查询今天沉默用户数 //某个设备 启动时间 在今天(本周、本月) 只有一次 ,后续在无启动 select count(*) from (select deviceid , count(createdatms) dcount,min(createdatms) dmin from ext_startup_logswhere appid = 'sdk34734' group by deviceid having dcount = 1 and min(createdatms) > getdaybegin(-1)) t 4.启动次数 4.1)今天app的启动次数 启动次数类似于活跃用户数,活跃用户数去重,启动次数不需要去重。 select count(*) from ext_startup_logs where appid = 'sdk34734' and ym = formattime(getdaybegin(),'yyyyMM') and day = formattime(getdaybegin(),'dd'); 5.版本分布 5.1)今天appid为34734的不同版本的活跃用户数。 select appversion,count(distinct deviceid) from ext_startup_logs where appid = 'sdk34734' and ym = formattime(getdaybegin(),'yyyyMM') and day = formattime(getdaybegin(),'dd') group by appversion ; 5.2)本周内每天各版本日活 select formattime(createdatms,'yyyyMMdd'),appversion , count(distinct deviceid) from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(),'yyyyMMdd') group by formattime(createdatms,'yyyyMMdd') , appversion [用户构成分析] 1.本周回流用户 上周未启动,本周启动了的用 必须当使用not in 子查询和后续查询都必须加入别名 select distinct a.deviceid from ext_startup_logs a where a.appid = 'sdk34734' and concat(a.ym,a.day) >= formattime(getweekbegin(),'yyyyMMdd') and a.deviceid not in ( select distinct t.deviceid from ext_startup_logs t where t.appid = 'sdk34734' and concat(t.ym,t.day) >= formattime(getweekbegin(-1),'yyyyMMdd') and concat(t.ym,t.day) < formattime(getweekbegin(),'yyyyMMdd') ) 2.连续活跃n周 连续三周活跃 2018101 20181008 20181016 去掉重有三次就是活跃 select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-2),'yyyyMMdd') group by deviceid having c = 3 3.忠诚用户 连续活跃5周的 select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-4),'yyyyMMdd') group by deviceid having c = 5 4.连续活跃用户 连续活跃n周 select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-1),'yyyyMMdd') group by deviceid having c = 2 select distinct(a.deviceid) from ext_startup_logs a where concat(a.ym,a.day) < formattime(getweekbegin(-4),'yyyyMMdd') and deviceid not in ( select distinct(t.deviceid) from ext_startup_logs t where concat(t.ym,t.day)>=formattime(getweekbegin(-4),'yyyyMMdd')) 5.近期流失用户 最近2、3、4都没有启动过app. 查询所有用户访问的时间的max,max不能落在 //四周内流失 select distinct(deviceid) from ext_startup_logs where appid='#' and concat(ym,day) >= formattime(getweekbegin(-4),'yyyyMMdd') and concat(ym,day) < formattime(getweekbegin(-3),'yyyyMMdd') and deviceid not in ( select distinct(t.deviceid) from ext_startup_logs t where t.appid='' and concat(t.ym,t.day) >= formattime(getweekbegin(-3),'yyyyMMdd') ) union //三周内流失 select distinct(deviceid) from ext_startup_logs where appid='#' and concat(ym,day) >= formattime(getweekbegin(-3),'yyyyMMdd') and concat(ym,day) < formattime(getweekbegin(-2),'yyyyMMdd') and deviceid not in ( select distinct(t.deviceid) from ext_startup_logs t where t.appid='' and concat(t.ym,t.day) >= formattime(getweekbegin(-2),'yyyyMMdd') ) union //两周内流失 select distinct(deviceid) from ext_startup_logs where appid='#' and concat(ym,day) >= formattime(getweekbegin(-2),'yyyyMMdd') and concat(ym,day) < formattime(getweekbegin(-1),'yyyyMMdd') and deviceid not in ( select distinct(t.deviceid) from ext_startup_logs t where t.appid='' and concat(t.ym,t.day) >= formattime(getweekbegin(-1),'yyyyMMdd') ) [留存分析] 1.留存用户 周留存用户。上周新增的用户在本周还使用的 select distinct(a.deviceid) from ext_startup_logs a where a.appid = 'sdk34734' and concat(a.ym,a.day) >= formattime(getweekbegin(-1),'yyyyMMdd') and concat(a.ym,a.day) < formattime(getweekbegin(),'yyyyMMdd') and a.deviceid in ( select distinct(t.deviceid) from ( select tt.deviceid , min(tt.createdatms) mintime from ext_startup_logs tt where tt.appid = 'sdk34734' group by tt.deviceid having mintime >= getweekbegin(-2) and mintime < getweekbegin(-1) ) t ) 2.用户的新鲜度 新鲜度 = 某段时间的新增用户数/某段时间的活跃的老用户数 . //今天活跃用户 m = select count(distinct(t.deviceid)) from ext_startup_logs where concat(ym,day) = formattime(getdaybegin(),'yyyyMMdd') and appid = ... ; //今天新增用户 n = select count(distinct(t.deviceid)) from ( select tt.deviceid , min(tt.createdatms) mintime from ext_startup_logs tt where tt.appid = 'sdk34734' group by tt.deviceid having mintime >= getdaybegin(0) ) t