zoukankan      html  css  js  c++  java
  • [Hive_10] Hive 的分析函数


    0. 说明

      Hive 的分析函数 窗口函数  | 排名函数 | 最大值 | 分层次 | lead && lag 统计活跃用户 | cume_dist


    1. 窗口函数(开窗函数) over

      1.1 说明

      1 preceding  //前一个
      1 following  //后一个
      current row  //当前行
      unbounded preceding  //无上限
      unbounded following  //无下限

      1.2 测试

    # 以行定义窗口界限
    select id, name, age , sum(age)over(order by id rows between current row and 2 following) from user_par;
    
    # 以值定义窗口界限
    select id, name, age , sum(age)over(order by age range between current row and 10 following) from user_par;

    2. 排名函数

      2.1 并列跳跃

      113
      rank

    select id, name, province, age , rank()over(partition by province order by age desc) from user_par;

      2.2 不跳跃

      112
      dense_rank

    select id, name, province, age , dense_rank()over(partition by province order by age desc) from user_par;

      2.3 顺序

      123
      row_number

    select id, name, province, age , row_number()over(partition by province order by age desc) from user_par;

    3. 最大值

      first_value()

    select id, name, province, age , first_value()over(partition by province order by age desc) from user_par;

    4. 分层次

      按照三六九等进行平均分层

      ntile()

    select id, name, age , ntile(3)over(order by age desc) from user_par;


    5. lead && lag

      5.1 lead()

      将列向上提

    select id, name, province, age , lead(age)over(partition by province order by age asc) from user_par;

      5.2 lag()

      将列向下沉

    select id, name, province, age , lag(age)over(partition by province order by age asc) from user_par;

      5.3 统计连续活跃

      1. 准备数据

      

      2. 建表

    create table active(id string, month int) 
    row format delimited
    fields terminated by '	';

      3. 加载数据

    load data local inpath '/home/centos/files/active.txt' into  table active;

      4. 统计连续两月活跃用户

    select id from (select id, month, lead(month)over(partition by id order by month desc) as month2 from active)a where month=month2+1;

    6. cume_dist()

      指定值占总数的百分比

      Demo

    select id,name,age, cume_dist()over(order by age desc) from user_nopar;

       


  • 相关阅读:
    网络性能评估
    HTML5 historyState pushState、replaceState
    pre换行段落间距
    2、hibernate七步走完成增删改查
    8.多线程和Socket通信
    7..使用反射动态创建数组和访问数组
    5. Java反射机制
    11、触发器
    10.程序包
    9.函数
  • 原文地址:https://www.cnblogs.com/share23/p/10298373.html
Copyright © 2011-2022 走看看