zoukankan      html  css  js  c++  java
  • hive的窗口函数1

    Hive中提供了越来越多的分析函数,用于完成负责的统计分析。抽时间将所有的分析窗口函数理一遍,将陆续发布。
    今天先看几个基础的,SUM、AVG、MIN、MAX。
    用于实现分组内所有和连续累积的统计。

    1.数据准备

    本地文件上的文件

    (my_python_env)[root@hadoop26 data]# cat cookie 
    cookie1,2015-04-10,1
    cookie1,2015-04-11,5
    cookie1,2015-04-12,7
    cookie1,2015-04-13,3
    cookie1,2015-04-14,2
    cookie1,2015-04-15,4
    cookie1,2015-04-16,4

     吧本地文件系统上的文件上传到hdfs上,并为此数据建立外部表

    (my_python_env)[root@hadoop26 data]# hadoop fs -put cookie /cookie
    
    
    hive->
    create external table cookie (
    cookieid string,
    createtime string,
    pv int
    )
    row format delimited
    fields terminated by ','
    stored as textfile
    location '/cookie';

    验证表中是否有数据:

    hive (default)> select * from cookie;
    OK
    cookie.cookieid    cookie.createtime    cookie.pv
    cookie1    2015-04-10    1
    cookie1    2015-04-11    5
    cookie1    2015-04-12    7
    cookie1    2015-04-13    3
    cookie1    2015-04-14    2
    cookie1    2015-04-15    4
    cookie1    2015-04-16    4
    Time taken: 0.098 seconds, Fetched: 7 row(s)

     窗口统计:

    select cookieid,createtime,pv,
    sum(pv) over(partition by cookieid order by createtime) as pv1,--默认为起点到当前行
    sum(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2,--current row代表当前行
    sum(pv) over(partition by cookieid) as pv3,--如果不指定rows between,统计所有行
    sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as pv4,--当前行+往前3行
    sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5,--当前行+往前3行+往后1行
    sum(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as pv6
    from cookie;

     统计结果:

    OK
    cookieid    createtime    pv    pv1    pv2    pv3    pv4    pv5    pv6
    cookie1        2015-04-16    4    26    26    26    13    13    4
    cookie1        2015-04-15    4    22    22    26    16    20    8
    cookie1        2015-04-14    2    18    18    26    17    21    10
    cookie1        2015-04-13    3    16    16    26    16    18    13
    cookie1        2015-04-12    7    13    13    26    13    16    20
    cookie1        2015-04-11    5    6    6    26    6    13    25
    cookie1        2015-04-10    1    1    1    26    1    6    26
    Time taken: 49.42 seconds, Fetched: 7 row(s)

    注意,结果和ORDER BY相关,默认为升序

    如果不指定ROWS BETWEEN,默认为从起点到当前行;
    如果不指定ORDER BY,则将分组内所有值累加;
    关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
    PRECEDING:往前
    FOLLOWING:往后
    CURRENT ROW:当前行
    UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点

    –其他AVG,MIN,MAX,和SUM用法一样。

     转自:http://lxw1234.com/archives/2015/04/176.htm

  • 相关阅读:
    HTML导航
    html5学习笔记
    crm使用soap启用和停用记录
    crmjs区分窗口是否是高速编辑(2)
    关于strace的一点东西
    Android studio第一次使用配置(三)gradle项目构建
    IntelliJ IDEA 问题总结之二(待补充) —— 快捷键、主题样式、导出jar、sqlite
    spark之map与flatMap差别
    leetcode:String to Integer (atoi)
    oracle索引的操作
  • 原文地址:https://www.cnblogs.com/dongdone/p/5737214.html
Copyright © 2011-2022 走看看