zoukankan      html  css  js  c++  java
  • [Hive_10] Hive 的分析函数


    0. 说明

      Hive 的分析函数 窗口函数  | 排名函数 | 最大值 | 分层次 | lead && lag 统计活跃用户 | cume_dist


    1. 窗口函数(开窗函数) over

      1.1 说明

      1 preceding  //前一个
      1 following  //后一个
      current row  //当前行
      unbounded preceding  //无上限
      unbounded following  //无下限

      1.2 测试

    # 以行定义窗口界限
    select id, name, age , sum(age)over(order by id rows between current row and 2 following) from user_par;
    
    # 以值定义窗口界限
    select id, name, age , sum(age)over(order by age range between current row and 10 following) from user_par;

    2. 排名函数

      2.1 并列跳跃

      113
      rank

    select id, name, province, age , rank()over(partition by province order by age desc) from user_par;

      2.2 不跳跃

      112
      dense_rank

    select id, name, province, age , dense_rank()over(partition by province order by age desc) from user_par;

      2.3 顺序

      123
      row_number

    select id, name, province, age , row_number()over(partition by province order by age desc) from user_par;

    3. 最大值

      first_value()

    select id, name, province, age , first_value()over(partition by province order by age desc) from user_par;

    4. 分层次

      按照三六九等进行平均分层

      ntile()

    select id, name, age , ntile(3)over(order by age desc) from user_par;


    5. lead && lag

      5.1 lead()

      将列向上提

    select id, name, province, age , lead(age)over(partition by province order by age asc) from user_par;

      5.2 lag()

      将列向下沉

    select id, name, province, age , lag(age)over(partition by province order by age asc) from user_par;

      5.3 统计连续活跃

      1. 准备数据

      

      2. 建表

    create table active(id string, month int) 
    row format delimited
    fields terminated by '	';

      3. 加载数据

    load data local inpath '/home/centos/files/active.txt' into  table active;

      4. 统计连续两月活跃用户

    select id from (select id, month, lead(month)over(partition by id order by month desc) as month2 from active)a where month=month2+1;

    6. cume_dist()

      指定值占总数的百分比

      Demo

    select id,name,age, cume_dist()over(order by age desc) from user_nopar;

       


  • 相关阅读:
    Python 正则表达式(分组)
    django 笔记
    Java代理和动态代理机制分析和应用
    Chrome浏览器如何调试移动端网页信息
    【数据分析】Excle中安装数据分析工具
    【BigData】Java基础_socket编程中使用多线程
    【BigData】Java基础_多线程
    【BigData】Java基础_socket编程
    财务报表之利润表
    资产负债表的会计恒等式
  • 原文地址:https://www.cnblogs.com/share23/p/10298373.html
Copyright © 2011-2022 走看看