zoukankan      html  css  js  c++  java
  • [Hive_10] Hive 的分析函数


    0. 说明

      Hive 的分析函数 窗口函数  | 排名函数 | 最大值 | 分层次 | lead && lag 统计活跃用户 | cume_dist


    1. 窗口函数(开窗函数) over

      1.1 说明

      1 preceding  //前一个
      1 following  //后一个
      current row  //当前行
      unbounded preceding  //无上限
      unbounded following  //无下限

      1.2 测试

    # 以行定义窗口界限
    select id, name, age , sum(age)over(order by id rows between current row and 2 following) from user_par;
    
    # 以值定义窗口界限
    select id, name, age , sum(age)over(order by age range between current row and 10 following) from user_par;

    2. 排名函数

      2.1 并列跳跃

      113
      rank

    select id, name, province, age , rank()over(partition by province order by age desc) from user_par;

      2.2 不跳跃

      112
      dense_rank

    select id, name, province, age , dense_rank()over(partition by province order by age desc) from user_par;

      2.3 顺序

      123
      row_number

    select id, name, province, age , row_number()over(partition by province order by age desc) from user_par;

    3. 最大值

      first_value()

    select id, name, province, age , first_value()over(partition by province order by age desc) from user_par;

    4. 分层次

      按照三六九等进行平均分层

      ntile()

    select id, name, age , ntile(3)over(order by age desc) from user_par;


    5. lead && lag

      5.1 lead()

      将列向上提

    select id, name, province, age , lead(age)over(partition by province order by age asc) from user_par;

      5.2 lag()

      将列向下沉

    select id, name, province, age , lag(age)over(partition by province order by age asc) from user_par;

      5.3 统计连续活跃

      1. 准备数据

      

      2. 建表

    create table active(id string, month int) 
    row format delimited
    fields terminated by '	';

      3. 加载数据

    load data local inpath '/home/centos/files/active.txt' into  table active;

      4. 统计连续两月活跃用户

    select id from (select id, month, lead(month)over(partition by id order by month desc) as month2 from active)a where month=month2+1;

    6. cume_dist()

      指定值占总数的百分比

      Demo

    select id,name,age, cume_dist()over(order by age desc) from user_nopar;

       


  • 相关阅读:
    直线的中点Bresenham算法的实现
    使用git 将自己的本地文件git到github上面的完整过程
    利用Microsoft VC++6.0 的MFC 的绘图工具实现简单图形的绘制
    github常见操作和常见错误及其解决办法
    浅谈软件配置管理工具(github & SVN)
    为Github 托管项目的访问添加SSH keys
    jQuery:用 lightTreeview 实现树形分类菜单的功能 展开收缩分类代码
    程序设计7大面向对象设计原则
    计算机组成原理实验之模拟整机实验考核试题
    计算机组成原理实验之CPU组成与指令周期实验
  • 原文地址:https://www.cnblogs.com/share23/p/10298373.html
Copyright © 2011-2022 走看看