zoukankan      html  css  js  c++  java
  • 【Hive】HiveQL实战之分析函数&窗口函数(补充)

    本篇承接【Hive】HiveQL实战之分析函数&窗口函数并对其进行补充。

    一 语法结构

    分析函数的使用常和Over、Partition By、Order By 和窗口说明一起,具体语法如下:
    Function (arg1,..., argn) OVER ([PARTITION BY <...>] [ORDER BY <....>]
    [<window_clause>])

    1 Function (arg1,..., argn)

    函数可以是以下类型:
    • 标准聚合函数:Count()、Sum()、Min()、MAX()或AVG();
    • Rank;
    • Dense_Rank;
    • Row_number;
    • Cume_dist;
    • Percent_Rank;
    • NTile;
    • Lead;
    • Lag;
    • First_Value;
    • Last_Value;
    2 PARTITION BY <...>

    对数据行进行分组,和Group By类似。

    3 ORDER BY <....>

    对数据进行排序,默认为升序。

    4 <window_clause>

    窗口说明进一步对结果进行分区并应用分析函数,有两种类型的窗口,分别为Row类型和Range类型。

    1)Row类型窗口定义

    ROWS BETWEEN <start_expr> AND <end_expr>

    2)窗口说明

    <start_expr>只能是以下之一:
    • Unbounded preceding;
    • Current row;
    • N Preceding 或 Following;

    <end_expr>只能是以下之一:
    • Unbounded following;
    • Current row;
    • N preceding 或Following;

    3)图形展示


    二 实战示例

    1 数据准备
    0: jdbc:hive2://localhost:10000/hive> create table ana(name string , depart int,salary int)
    . . . . . . . . . . . . . . . . . . > row format delimited
    . . . . . . . . . . . . . . . . . . > fields terminated by '|';
    No rows affected (3.863 seconds)
    0: jdbc:hive2://localhost:10000/hive> load data local inpath '/home/hadoop/tt' overwrite into table ana;
    No rows affected (1.044 seconds)
    0: jdbc:hive2://localhost:10000/hive> select *from ana;
    +-----------+-------------+-------------+
    | ana.name  | ana.depart  | ana.salary  |
    +-----------+-------------+-------------+
    | Lucy      | 1000        | 5500        |
    | Michael   | 1000        | 5000        |
    | Steven    | 1000        | 6400        |
    | Will      | 1000        | 4000        |
    | Will      | 1000        | 4000        |
    | Jess      | 1001        | 6000        |
    | Lily      | 1001        | 5000        |
    | Mike      | 1001        | 6400        |
    | Richard   | 1002        | 8000        |
    | Wei       | 1002        | 7000        |
    | Yun       | 1002        | 5500        |
    +-----------+-------------+-------------+
    11 rows selected (0.377 seconds)
    2 示例一
    0: jdbc:hive2://localhost:10000/hive> select row_number()over(order by depart,name) rnk,
    . . . . . . . . . . . . . . . . . . > name,depart,salary,count(*)over(partition by depart) as row_cnt,
    . . . . . . . . . . . . . . . . . . > sum(salary)over(partition by depart order by depart) as deptTotal,
    . . . . . . . . . . . . . . . . . . > sum(salary)over(order by depart) as runningTotal,
    . . . . . . . . . . . . . . . . . . > sum(salary)over(order by depart rows unbounded preceding) as runningTotal2,
    . . . . . . . . . . . . . . . . . . > sum(salary)over(order by depart rows between unbounded preceding and current row) as runningTotal3
    . . . . . . . . . . . . . . . . . . > from ana
    . . . . . . . . . . . . . . . . . . > order by depart,name;
    +------+----------+---------+---------+----------+------------+---------------+----------------+----------------+
    | rnk  |   name   | depart  | salary  | row_cnt  | depttotal  | runningtotal  | runningtotal2  | runningtotal3  |
    +------+----------+---------+---------+----------+------------+---------------+----------------+----------------+
    | 1    | Lucy     | 1000    | 5500    | 5        | 24900      | 24900         | 5500           | 5500           |
    | 2    | Michael  | 1000    | 5000    | 5        | 24900      | 24900         | 10500          | 10500          |
    | 3    | Steven   | 1000    | 6400    | 5        | 24900      | 24900         | 16900          | 16900          |
    | 5    | Will     | 1000    | 4000    | 5        | 24900      | 24900         | 24900          | 24900          |
    | 4    | Will     | 1000    | 4000    | 5        | 24900      | 24900         | 20900          | 20900          |
    | 6    | Jess     | 1001    | 6000    | 3        | 17400      | 42300         | 30900          | 30900          |
    | 7    | Lily     | 1001    | 5000    | 3        | 17400      | 42300         | 35900          | 35900          |
    | 8    | Mike     | 1001    | 6400    | 3        | 17400      | 42300         | 42300          | 42300          |
    | 9    | Richard  | 1002    | 8000    | 3        | 20500      | 62800         | 50300          | 50300          |
    | 10   | Wei      | 1002    | 7000    | 3        | 20500      | 62800         | 57300          | 57300          |
    | 11   | Yun      | 1002    | 5500    | 3        | 20500      | 62800         | 62800          | 62800          |
    +------+----------+---------+---------+----------+------------+---------------+----------------+----------------+
    11 rows selected (241.63 seconds)
    3 示例二
    0: jdbc:hive2://localhost:10000/hive> select row_number()over(order by depart,name) rnk,
    . . . . . . . . . . . . . . . . . . > name,depart,salary,
    . . . . . . . . . . . . . . . . . . > rank() over(partition by depart order by salary) as rank,
    . . . . . . . . . . . . . . . . . . > dense_rank()over(partition by depart order by salary) as dense_rank,
    . . . . . . . . . . . . . . . . . . > row_number()over() as row_num,
    . . . . . . . . . . . . . . . . . . > round(cume_dist()over(partition by depart order by salary),1) as cume_dist,
    . . . . . . . . . . . . . . . . . . > percent_rank()over(partition by depart order by salary) as percent_rank,
    . . . . . . . . . . . . . . . . . . > ntile(4) over(partition by depart order by salary) as ntile
    . . . . . . . . . . . . . . . . . . > from ana
    . . . . . . . . . . . . . . . . . . > order by depart;
    +------+----------+---------+---------+-------+-------------+----------+------------+---------------+--------+
    | rnk  |   name   | depart  | salary  | rank  | dense_rank  | row_num  | cume_dist  | percent_rank  | ntile  |
    +------+----------+---------+---------+-------+-------------+----------+------------+---------------+--------+
    | 5    | Will     | 1000    | 4000    | 1     | 1           | 11       | 0.4        | 0.0           | 1      |
    | 4    | Will     | 1000    | 4000    | 1     | 1           | 10       | 0.4        | 0.0           | 1      |
    | 2    | Michael  | 1000    | 5000    | 3     | 2           | 9        | 0.6        | 0.5           | 2      |
    | 1    | Lucy     | 1000    | 5500    | 4     | 3           | 8        | 0.8        | 0.75          | 3      |
    | 3    | Steven   | 1000    | 6400    | 5     | 4           | 7        | 1.0        | 1.0           | 4      |
    | 7    | Lily     | 1001    | 5000    | 1     | 1           | 6        | 0.3        | 0.0           | 1      |
    | 6    | Jess     | 1001    | 6000    | 2     | 2           | 5        | 0.7        | 0.5           | 2      |
    | 8    | Mike     | 1001    | 6400    | 3     | 3           | 4        | 1.0        | 1.0           | 3      |
    | 11   | Yun      | 1002    | 5500    | 1     | 1           | 3        | 0.3        | 0.0           | 1      |
    | 10   | Wei      | 1002    | 7000    | 2     | 2           | 2        | 0.7        | 0.5           | 2      |
    | 9    | Richard  | 1002    | 8000    | 3     | 3           | 1        | 1.0        | 1.0           | 3      |
    +------+----------+---------+---------+-------+-------------+----------+------------+---------------+--------+
    11 rows selected (229.274 seconds)
    4 示例三
    0: jdbc:hive2://localhost:10000/hive> select name,depart,salary,
    . . . . . . . . . . . . . . . . . . > row_number()over(order by depart,name) rnk,
    . . . . . . . . . . . . . . . . . . > lead(salary,2)over(partition by depart order by salary) as lead,
    . . . . . . . . . . . . . . . . . . > lag(salary,2)over(partition by depart order by salary) as lag,
    . . . . . . . . . . . . . . . . . . > first_value(salary)over(partition by depart order by salary) as first_value,
    . . . . . . . . . . . . . . . . . . > last_value(salary)over(partition by depart order by salary) as last_value_default,
    . . . . . . . . . . . . . . . . . . > last_value(salary)over(partition by depart order by salary rows between unbounded preceding and unbounded following) as last_value
    . . . . . . . . . . . . . . . . . . > from ana
    . . . . . . . . . . . . . . . . . . > order by depart;
    +----------+---------+---------+------+-------+-------+--------------+---------------------+-------------+
    |   name   | depart  | salary  | rnk  | lead  |  lag  | first_value  | last_value_default  | last_value  |
    +----------+---------+---------+------+-------+-------+--------------+---------------------+-------------+
    | Steven   | 1000    | 6400    | 3    | NULL  | 5000  | 4000         | 6400                | 6400        |
    | Lucy     | 1000    | 5500    | 1    | NULL  | 4000  | 4000         | 5500                | 6400        |
    | Michael  | 1000    | 5000    | 2    | 6400  | 4000  | 4000         | 5000                | 6400        |
    | Will     | 1000    | 4000    | 4    | 5500  | NULL  | 4000         | 4000                | 6400        |
    | Will     | 1000    | 4000    | 5    | 5000  | NULL  | 4000         | 4000                | 6400        |
    | Mike     | 1001    | 6400    | 8    | NULL  | 5000  | 5000         | 6400                | 6400        |
    | Jess     | 1001    | 6000    | 6    | NULL  | NULL  | 5000         | 6000                | 6400        |
    | Lily     | 1001    | 5000    | 7    | 6400  | NULL  | 5000         | 5000                | 6400        |
    | Richard  | 1002    | 8000    | 9    | NULL  | 5500  | 5500         | 8000                | 8000        |
    | Wei      | 1002    | 7000    | 10   | NULL  | NULL  | 5500         | 7000                | 8000        |
    | Yun      | 1002    | 5500    | 11   | 8000  | NULL  | 5500         | 5500                | 8000        |
    +----------+---------+---------+------+-------+-------+--------------+---------------------+-------------+
    11 rows selected (172.483 seconds)
    5 示例四
    0: jdbc:hive2://localhost:10000/hive> select name,depart,salary,
    . . . . . . . . . . . . . . . . . . > row_number()over(order by depart,name) rnk,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows between 2 preceding and current row) win1,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows between 2 preceding and unbounded following) win2,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows between 1 preceding and 2 following) win3,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows between 1 following and 2 following) win5,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows between current row and current row) win6,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows between current row and 1 following) win7,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows between current row and unbounded following) win8,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows between unbounded preceding and current row) win9,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows between unbounded preceding and 1 following) win10,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows between unbounded preceding and unbounded following) win11,
    . . . . . . . . . . . . . . . . . . > max(salary)over(partition by depart order by name rows 2 preceding) win12
    . . . . . . . . . . . . . . . . . . > from ana
    . . . . . . . . . . . . . . . . . . > order by depart,name;
    +----------+---------+---------+------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+
    |   name   | depart  | salary  | rnk  | win1  | win2  | win3  | win5  | win6  | win7  | win8  | win9  | win10  | win11  | win12  |
    +----------+---------+---------+------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+
    | Lucy     | 1000    | 5500    | 1    | 5500  | 6400  | 6400  | 6400  | 5500  | 5500  | 6400  | 5500  | 5500   | 6400   | 5500   |
    | Michael  | 1000    | 5000    | 2    | 5500  | 6400  | 6400  | 6400  | 5000  | 6400  | 6400  | 5500  | 6400   | 6400   | 5500   |
    | Steven   | 1000    | 6400    | 3    | 6400  | 6400  | 6400  | 4000  | 6400  | 6400  | 6400  | 6400  | 6400   | 6400   | 6400   |
    | Will     | 1000    | 4000    | 4    | 6400  | 6400  | 4000  | NULL  | 4000  | 4000  | 4000  | 6400  | 6400   | 6400   | 6400   |
    | Will     | 1000    | 4000    | 5    | 6400  | 6400  | 6400  | 4000  | 4000  | 4000  | 4000  | 6400  | 6400   | 6400   | 6400   |
    | Jess     | 1001    | 6000    | 6    | 6000  | 6400  | 6400  | 6400  | 6000  | 6000  | 6400  | 6000  | 6000   | 6400   | 6000   |
    | Lily     | 1001    | 5000    | 7    | 6000  | 6400  | 6400  | 6400  | 5000  | 6400  | 6400  | 6000  | 6400   | 6400   | 6000   |
    | Mike     | 1001    | 6400    | 8    | 6400  | 6400  | 6400  | NULL  | 6400  | 6400  | 6400  | 6400  | 6400   | 6400   | 6400   |
    | Richard  | 1002    | 8000    | 9    | 8000  | 8000  | 8000  | 7000  | 8000  | 8000  | 8000  | 8000  | 8000   | 8000   | 8000   |
    | Wei      | 1002    | 7000    | 10   | 8000  | 8000  | 8000  | 5500  | 7000  | 7000  | 7000  | 8000  | 8000   | 8000   | 8000   |
    | Yun      | 1002    | 5500    | 11   | 8000  | 8000  | 7000  | NULL  | 5500  | 5500  | 5500  | 8000  | 8000   | 8000   | 8000   |
    +----------+---------+---------+------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+
    11 rows selected (190.488 seconds)
    6 示例五
    0: jdbc:hive2://localhost:10000/hive> select name,depart,salary,                                                                                                      
    . . . . . . . . . . . . . . . . . . > row_number()over(order by depart,name) rnk,                                               
    . . . . . . . . . . . . . . . . . . > max(salary)over w1 as win1,                                                               
    . . . . . . . . . . . . . . . . . . > max(salary)over w1 as win2,                                                               
    . . . . . . . . . . . . . . . . . . > max(salary)over w3 as win3                                                                
    . . . . . . . . . . . . . . . . . . > from ana 
    . . . . . . . . . . . . . . . . . . > window 
    . . . . . . . . . . . . . . . . . . > w1 as (partition by depart order by name rows between 2 preceding and current row),
    . . . . . . . . . . . . . . . . . . > w3 as (partition by depart order by name rows between 1 preceding and 2 following)
    . . . . . . . . . . . . . . . . . . > order by depart,name;
    +----------+---------+---------+------+-------+-------+-------+
    |   name   | depart  | salary  | rnk  | win1  | win2  | win3  |
    +----------+---------+---------+------+-------+-------+-------+
    | Lucy     | 1000    | 5500    | 1    | 5500  | 5500  | 6400  |
    | Michael  | 1000    | 5000    | 2    | 5500  | 5500  | 6400  |
    | Steven   | 1000    | 6400    | 3    | 6400  | 6400  | 6400  |
    | Will     | 1000    | 4000    | 4    | 6400  | 6400  | 4000  |
    | Will     | 1000    | 4000    | 5    | 6400  | 6400  | 6400  |
    | Jess     | 1001    | 6000    | 6    | 6000  | 6000  | 6400  |
    | Lily     | 1001    | 5000    | 7    | 6000  | 6000  | 6400  |
    | Mike     | 1001    | 6400    | 8    | 6400  | 6400  | 6400  |
    | Richard  | 1002    | 8000    | 9    | 8000  | 8000  | 8000  |
    | Wei      | 1002    | 7000    | 10   | 8000  | 8000  | 8000  |
    | Yun      | 1002    | 5500    | 11   | 8000  | 8000  | 7000  |
    +----------+---------+---------+------+-------+-------+-------+
    11 rows selected (189.893 seconds)



  • 相关阅读:
    用栈实现队列
    “非常规”的漏洞挖掘思路与技巧-请求参数加密,js接口- 漏洞定级评分的标准与关注点-违规测试标准详解
    【linux 文件管理】2-文件目录命令
    EHC
    kali linux高级渗透测试第七课
    maltego CE社区版-Domain与DNS name
    name servers-域名服务器
    【linux 文件管理】1-文件目录结构
    web应用安全自学指南
    kali linux高级渗透测试第六课
  • 原文地址:https://www.cnblogs.com/alen-liu-sz/p/12975622.html
Copyright © 2011-2022 走看看