zoukankan      html  css  js  c++  java
  • 项目实战从0到1之hive(17)hive求新增用户数,日活,留存率

    很简单的sql 用户分析语句 :只要自定义简单的udf函数 获取统计时间createdatms字段的
    使用的日历类 add方法 和simpledateformat 将long类型的 定义多个重载方法 获取返回值int类型 或者long类型 进行时间判断即可
    getdaybegin(天开始),比如2017-08-08这一天的createtime为15288888888888 获取到 152888880000(代表20170808 00:00:00)当天开始的凌晨 getWeekbegin,getMonthgin  同上道理 
    过去的五周(包含本周)某个app每周的周活跃用户数
     注意,如果能够界定分区区间的话,务必要进行分区限定查询。
     20170501
     ym/day/hm
     //过去的五周,每周的活跃数
     select  formattime(createdatms,'yyyyMMdd',0) stdate, count(distinct deviceid) stcount from ext_startup_logs where concat(ym,day)>=formattime(getweekbegin(-4),'yyyyMMdd') and appid ='sdk34734' group by formattime(createdatms,'yyyyMMdd',0) ;
     2.最近的六个月(包含本月)每月的月活跃数。
     select  formattime(createdatms,'yyyyMM') stdate, count(distinct deviceid) stcount from ext_startup_logs where ym >= formattime(getmonthbegin(-5),'yyyyMM') and appid ='sdk34734' group by formattime(createdatms,'yyyyMM') ;
     3.沉默用户数
     3.1)查询今天沉默用户数     //某个设备 启动时间 在今天(本周、本月) 只有一次 ,后续在无启动
     select count(*) from (select deviceid , count(createdatms) dcount,min(createdatms) dmin from ext_startup_logswhere appid = 'sdk34734' group by deviceid having dcount = 1 and min(createdatms) > getdaybegin(-1)) t
     4.启动次数
     4.1)今天app的启动次数
     启动次数类似于活跃用户数,活跃用户数去重,启动次数不需要去重。
     select count(*) from ext_startup_logs where appid = 'sdk34734' and ym = formattime(getdaybegin(),'yyyyMM') and day = formattime(getdaybegin(),'dd');
     5.版本分布
     5.1)今天appid为34734的不同版本的活跃用户数。
     select appversion,count(distinct deviceid) from ext_startup_logs where appid = 'sdk34734' and ym = formattime(getdaybegin(),'yyyyMM') and day = formattime(getdaybegin(),'dd') group by appversion ;
     
     5.2)本周内每天各版本日活
     select formattime(createdatms,'yyyyMMdd'),appversion , count(distinct deviceid) from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(),'yyyyMMdd') group by formattime(createdatms,'yyyyMMdd') , appversion
     
     
     [用户构成分析]
     1.本周回流用户  上周未启动,本周启动了的用   必须当使用not in 子查询和后续查询都必须加入别名
     select
     distinct a.deviceid
     from ext_startup_logs a
     where a.appid = 'sdk34734' and concat(a.ym,a.day) >= formattime(getweekbegin(),'yyyyMMdd') and a.deviceid not in (
     select
     distinct t.deviceid
     from ext_startup_logs t
     where t.appid = 'sdk34734' and concat(t.ym,t.day) >= formattime(getweekbegin(-1),'yyyyMMdd') and  concat(t.ym,t.day) < formattime(getweekbegin(),'yyyyMMdd')
     )
     
     2.连续活跃n周  连续三周活跃  2018101 20181008 20181016  去掉重有三次就是活跃
     select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-2),'yyyyMMdd') group by deviceid having c = 3
     
     3.忠诚用户 连续活跃5周的
     select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-4),'yyyyMMdd') group by deviceid having c = 5
     
     4.连续活跃用户 连续活跃n周
     select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-1),'yyyyMMdd') group by deviceid having c = 2
     
     
     select distinct(a.deviceid) from ext_startup_logs a where  concat(a.ym,a.day) < formattime(getweekbegin(-4),'yyyyMMdd') and  deviceid not in ( select distinct(t.deviceid) from ext_startup_logs t where concat(t.ym,t.day)>=formattime(getweekbegin(-4),'yyyyMMdd'))
     
     5.近期流失用户
     最近2、3、4都没有启动过app.
     查询所有用户访问的时间的max,max不能落在
     //四周内流失
     select
     distinct(deviceid)
     from ext_startup_logs
     where appid='#'
     and concat(ym,day) >= formattime(getweekbegin(-4),'yyyyMMdd')
     and concat(ym,day) < formattime(getweekbegin(-3),'yyyyMMdd')
     and deviceid not in (
     select
     distinct(t.deviceid)
     from ext_startup_logs t
     where t.appid=''
     and concat(t.ym,t.day) >= formattime(getweekbegin(-3),'yyyyMMdd')
     
     )
     union
     //三周内流失
     select
     distinct(deviceid)
     from ext_startup_logs
     where appid='#'
     and concat(ym,day) >= formattime(getweekbegin(-3),'yyyyMMdd')
     and concat(ym,day) < formattime(getweekbegin(-2),'yyyyMMdd')
     and deviceid not in (
     select
     distinct(t.deviceid)
     from ext_startup_logs t
     where t.appid=''
     and concat(t.ym,t.day) >= formattime(getweekbegin(-2),'yyyyMMdd')
     
     )
     union
     //两周内流失
     select
     distinct(deviceid)
     from ext_startup_logs
     where appid='#'
     and concat(ym,day) >= formattime(getweekbegin(-2),'yyyyMMdd')
     and concat(ym,day) < formattime(getweekbegin(-1),'yyyyMMdd')
     and deviceid not in (
     select
     distinct(t.deviceid)
     from ext_startup_logs t
     where t.appid=''
     and concat(t.ym,t.day) >= formattime(getweekbegin(-1),'yyyyMMdd')
     )
     
     
     
     [留存分析]
     1.留存用户
     周留存用户。上周新增的用户在本周还使用的
     select
     distinct(a.deviceid)
     from ext_startup_logs a
     where a.appid = 'sdk34734'
     and concat(a.ym,a.day) >= formattime(getweekbegin(-1),'yyyyMMdd')
     and concat(a.ym,a.day) < formattime(getweekbegin(),'yyyyMMdd')
     and a.deviceid in (
     select distinct(t.deviceid)
     from (
     select tt.deviceid , min(tt.createdatms) mintime
     from ext_startup_logs tt
     where tt.appid = 'sdk34734'
     group by tt.deviceid having mintime >= getweekbegin(-2) and mintime < getweekbegin(-1)
     ) t
     )
     
     
     
     
     2.用户的新鲜度
     新鲜度 = 某段时间的新增用户数/某段时间的活跃的老用户数 .
     //今天活跃用户
     
     m = select count(distinct(t.deviceid))
     from ext_startup_logs where concat(ym,day) = formattime(getdaybegin(),'yyyyMMdd')  and appid = ... ;
     //今天新增用户
     n = select count(distinct(t.deviceid))
     from (
     select tt.deviceid , min(tt.createdatms) mintime
     from ext_startup_logs tt
     where tt.appid = 'sdk34734'
     group by tt.deviceid having mintime >= getdaybegin(0)
     ) t
    作者:大码王

    -------------------------------------------

    个性签名:独学而无友,则孤陋而寡闻。做一个灵魂有趣的人!

    如果觉得这篇文章对你有小小的帮助的话,记得在右下角点个“推荐”哦,博主在此感谢!

    万水千山总是情,打赏一分行不行,所以如果你心情还比较高兴,也是可以扫码打赏博主,哈哈哈(っ•?ω•?)っ???!

  • 相关阅读:
    JS学习之构造函数、原型、原型链
    JS学习之面向对象(面向对象的创建方法,new运算符的工作原理)
    JS学习之事件流
    JS学习之生命周期与垃圾回收机制
    关于在XP操作系统和IIS5.1环境下的MVC环境搭建之IIS错误
    VS2010、.net 4.0下MVC3开发中Code First开发模式的数据迁移小结
    关于MVC3框架下的Jquery异步请求函数的学习心得之一——$.post()
    关于ASP调用存储过程的经典资料转载
    关于windows环境下的IIS 500内部服务器错误的一种解决办法
    接VS2010+Net+MVC3+EF4.1环境下的Code First一文的补充说明
  • 原文地址:https://www.cnblogs.com/huanghanyu/p/13638495.html
Copyright © 2011-2022 走看看