zoukankan      html  css  js  c++  java
  • [Hive_10] Hive 的分析函数


    0. 说明

      Hive 的分析函数 窗口函数  | 排名函数 | 最大值 | 分层次 | lead && lag 统计活跃用户 | cume_dist


    1. 窗口函数(开窗函数) over

      1.1 说明

      1 preceding  //前一个
      1 following  //后一个
      current row  //当前行
      unbounded preceding  //无上限
      unbounded following  //无下限

      1.2 测试

    # 以行定义窗口界限
    select id, name, age , sum(age)over(order by id rows between current row and 2 following) from user_par;
    
    # 以值定义窗口界限
    select id, name, age , sum(age)over(order by age range between current row and 10 following) from user_par;

    2. 排名函数

      2.1 并列跳跃

      113
      rank

    select id, name, province, age , rank()over(partition by province order by age desc) from user_par;

      2.2 不跳跃

      112
      dense_rank

    select id, name, province, age , dense_rank()over(partition by province order by age desc) from user_par;

      2.3 顺序

      123
      row_number

    select id, name, province, age , row_number()over(partition by province order by age desc) from user_par;

    3. 最大值

      first_value()

    select id, name, province, age , first_value()over(partition by province order by age desc) from user_par;

    4. 分层次

      按照三六九等进行平均分层

      ntile()

    select id, name, age , ntile(3)over(order by age desc) from user_par;


    5. lead && lag

      5.1 lead()

      将列向上提

    select id, name, province, age , lead(age)over(partition by province order by age asc) from user_par;

      5.2 lag()

      将列向下沉

    select id, name, province, age , lag(age)over(partition by province order by age asc) from user_par;

      5.3 统计连续活跃

      1. 准备数据

      

      2. 建表

    create table active(id string, month int) 
    row format delimited
    fields terminated by '	';

      3. 加载数据

    load data local inpath '/home/centos/files/active.txt' into  table active;

      4. 统计连续两月活跃用户

    select id from (select id, month, lead(month)over(partition by id order by month desc) as month2 from active)a where month=month2+1;

    6. cume_dist()

      指定值占总数的百分比

      Demo

    select id,name,age, cume_dist()over(order by age desc) from user_nopar;

       


  • 相关阅读:
    多项式全家桶——Part.3 多项式求逆、除法、开根号
    多项式全家桶——Part.2 多项式位运算
    多项式全家桶——Part.1 多项式加减乘
    CSP2019总结
    jzoj6384. 【NOIP2019模拟2019.10.23】珂学家
    jzoj6377. 【NOIP2019模拟2019.10.05】幽曲[埋骨于弘川]
    jzoj6374. 【NOIP2019模拟2019.10.04】结界[生与死的境界]
    jzoj6370. 【NOIP2019模拟2019.9.28】基础 fake 练习题
    一个初学者的辛酸路程-基于Django写BBS项目
    一个初学者的辛酸路程-依旧Django
  • 原文地址:https://www.cnblogs.com/share23/p/10298373.html
Copyright © 2011-2022 走看看