zoukankan      html  css  js  c++  java
  • hive 函数

    collect_set(x)   列转行函数---没有重复, 组装多列的数据的结构体
    collect_list(x) 列转行函数---可以有重复,组装多列的数据的结构体
    concat_ws 拼接函数, 用于多列转成同一行字段后,间隔符

    UDF(User-Defined-Function) 用户定义(普通)函数,只对单行数值产生作用;

    UDAF(User- Defined Aggregation Funcation)用户定义聚合函数,可对多行数据产生作用;等同与SQL中常用的SUM(),AVG(),也是聚合函数;

    UDTF(User-Defined Table-Generating Functions)  用来解决 输入一行输出多行(On-to-many maping) 的需求。

    lateral view用于和split、explode等UDTF一起使用的,能将一行数据拆分成多行数据,在此基础上可以对拆分的数据进行聚合,lateral view首先为原始表的每行调用UDTF,UDTF会把一行拆分成一行或者多行,lateral view把结果组合,产生一个支持别名表的虚拟表。下例中的 lateral view explode(subdinates) adTable  as aa; 虚拟表adTable的别名为aa

    explode(ARRAY)  列表中的每个元素生成一行

    explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列

    | CREATE TABLE `employees`(                                            |
    |   `name` string,                                                     |
    |   `salary` float,                                                    |
    |   `subdinates` array<string>,                                        |
    |   `deducation` map<string,float>,                                    |
    |   `address` struct<street:string,city:string,state:string,zip:int>)  |
    | ROW FORMAT SERDE                                                     |
    |   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'               |
    | STORED AS INPUTFORMAT                                                |
    |   'org.apache.hadoop.mapred.TextInputFormat'                         |
    | OUTPUTFORMAT                                                         |
    |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'       |
    | LOCATION                                                             |
    |   'hdfs://localhost:9000/user/hive/warehouse/gamedw.db/employees'    |
    | TBLPROPERTIES (                                                      |
    |   'creator'='tianyongtao',                                           |
    |   'last_modified_by'='root',                                         |
    |   'last_modified_time'='1521447397',                                 |
    |   'numFiles'='0',                                                    |
    |   'numRows'='0',                                                     |
    |   'rawDataSize'='0',                                                 |
    |   'totalSize'='0',                                                   |
    |   'transient_lastDdlTime'='1521447397')                              |
    +----------------------------------------------------------------------+--+

     

    Array类型字段的处理

    0: jdbc:hive2://192.168.53.122:10000/default> select name,subdinates  from employees;
    +---------------+-------------------------+--+
    |     name      |       subdinates        |
    +---------------+-------------------------+--+
    | tianyongtao   | ["wang","ZHANG","LIU"]  |
    | wangyangming  | ["ma","zhong"]          |
    +---------------+-------------------------+--+
    2 rows selected (0.301 seconds)

    0: jdbc:hive2://192.168.53.122:10000/default> select name,aa  from employees lateral view explode(subdinates) adTable  as aa;
    +---------------+--------+--+
    |     name      |   aa   |
    +---------------+--------+--+
    | tianyongtao   | wang   |
    | tianyongtao   | ZHANG  |
    | tianyongtao   | LIU    |
    | wangyangming  | ma     |
    | wangyangming  | zhong  |
    +---------------+--------+--+
    5 rows selected (0.312 seconds)

    Map类型字段的处理

    0: jdbc:hive2://192.168.53.122:10000/default> select deducation  from employees;
    +---------------------------------+--+
    |           deducation            |
    +---------------------------------+--+
    | {"aaa":10.0,"bb":5.0,"CC":8.0}  |
    | {"aaa":6.0,"bb":12.0}           |
    +---------------------------------+--+
    2 rows selected (0.315 seconds)
    0: jdbc:hive2://192.168.53.122:10000/default> select explode(deducation) as (aa,bb)  from employees;
    +------+-------+--+
    |  aa  |  bb   |
    +------+-------+--+
    | aaa  | 10.0  |
    | bb   | 5.0   |
    | CC   | 8.0   |
    | aaa  | 6.0   |
    | bb   | 12.0  |
    +------+-------+--+
    5 rows selected (0.314 seconds)
    0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb  from employees lateral view explode(deducation) mtable as aa,bb;
    +---------------+------+-------+--+
    |     name      |  aa  |  bb   |
    +---------------+------+-------+--+
    | tianyongtao   | aaa  | 10.0  |
    | tianyongtao   | bb   | 5.0   |
    | tianyongtao   | CC   | 8.0   |
    | wangyangming  | aaa  | 6.0   |
    | wangyangming  | bb   | 12.0  |
    +---------------+------+-------+--+
    5 rows selected (0.347 seconds)

    0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb,cc  from employees lateral view explode(deducation) mtable as aa,bb lateral view explode(subdinates) adTable  as cc;
    +---------------+------+-------+--------+--+
    |     name      |  aa  |  bb   |   cc   |
    +---------------+------+-------+--------+--+
    | tianyongtao   | aaa  | 10.0  | wang   |
    | tianyongtao   | aaa  | 10.0  | ZHANG  |
    | tianyongtao   | aaa  | 10.0  | LIU    |
    | tianyongtao   | bb   | 5.0   | wang   |
    | tianyongtao   | bb   | 5.0   | ZHANG  |
    | tianyongtao   | bb   | 5.0   | LIU    |
    | tianyongtao   | CC   | 8.0   | wang   |
    | tianyongtao   | CC   | 8.0   | ZHANG  |
    | tianyongtao   | CC   | 8.0   | LIU    |
    | wangyangming  | aaa  | 6.0   | ma     |
    | wangyangming  | aaa  | 6.0   | zhong  |
    | wangyangming  | bb   | 12.0  | ma     |
    | wangyangming  | bb   | 12.0  | zhong  |
    +---------------+------+-------+--------+--+
    13 rows selected (0.305 seconds)

    结构体类型字段:

    0: jdbc:hive2://192.168.53.122:10000/default> select name,address.street,address.city,address.state  from employees;
    +---------------+---------+-----------+----------+--+
    |     name      | street  |   city    |  state   |
    +---------------+---------+-----------+----------+--+
    | tianyongtao   | HENAN   | LUOHE     | LINYING  |
    | wangyangming  | hunan   | changsha  | NULL     |
    +---------------+---------+-----------+----------+--+
    2 rows selected (0.309 seconds)

    collect_set():该函数的作用是将某字段的值进行去重汇总,产生Array类型字段

    0: jdbc:hive2://192.168.53.122:10000/default> select * from cust;
    +------------------+-----------+----------------+--+
    |  cust.custname   | cust.sex  | cust.nianling  |
    +------------------+-----------+----------------+--+
    | tianyt_touch100  | 1         | 50             |
    | wangwu           | 1         | 85             |
    | zhangsan         | 1         | 20             |
    | liuqin           | 0         | 56             |
    | wangwu           | 0         | 47             |
    | liuyang          | 1         | 32             |
    | hello            | 0         | 100            |
    | mahuateng        | 1         | 1001           |
    | tianyt_touch100  | 1         | 50             |
    | wangwu           | 1         | 85             |
    | zhangsan         | 1         | 20             |
    | liuqin           | 0         | 56             |
    | wangwu           | 0         | 47             |
    | nihao            | 1         | 5              |
    | liuyang          | 1         | 32             |
    | hello            | 0         | 100            |
    | mahuateng        | 1         | 1001           |
    | nihao            | 1         | 5              |
    +------------------+-----------+----------------+--+


    scala> hcon.sql("select sex,collect_set(nianling) from gamedw.cust group by sex").show
    +---+---------------------+
    |sex|collect_set(nianling)|
    +---+---------------------+
    |  1| [85, 5, 20, 50, 3...|
    |  0|        [100, 56, 47]|
    +---+---------------------+

    0: jdbc:hive2://192.168.53.122:10000/default> select * from cityinfo;
    +----------------+---------------------------------------------------------------+--+
    | cityinfo.city  |                      cityinfo.districts                       |
    +----------------+---------------------------------------------------------------+--+
    | shenzhen       | longhua,futian,baoan,longgang,dapeng,guangming,nanshan,luohu  |
    | qingdao        | shinan,lichang,jimo,jiaozhou,huangdao,laoshan                 |
    +----------------+---------------------------------------------------------------+--+

    0: jdbc:hive2://192.168.53.122:10000/default> select city,area from cityinfo lateral view explode(split(districts,",")) areatable as area;
    +-----------+------------+--+
    |   city    |    area    |
    +-----------+------------+--+
    | shenzhen  | longhua    |
    | shenzhen  | futian     |
    | shenzhen  | baoan      |
    | shenzhen  | longgang   |
    | shenzhen  | dapeng     |
    | shenzhen  | guangming  |
    | shenzhen  | nanshan    |
    | shenzhen  | luohu      |
    | qingdao   | shinan     |
    | qingdao   | lichang    |
    | qingdao   | jimo       |
    | qingdao   | jiaozhou   |
    | qingdao   | huangdao   |
    | qingdao   | laoshan    |
    +-----------+------------+--+
    14 rows selected (0.479 seconds)

    已知数据求截止当前月的最大值与截止当前月份的和:

    scala> hcon.sql("select * from gamedw.visists order by custid,monthid").show
    +------+-------+-----+
    |custid|monthid|times|
    +------+-------+-----+
    |     1| 201801|   25|
    |     1| 201801|   10|
    |     1| 201802|   35|
    |     1| 201802|    7|
    |     1| 201803|   52|
    |     1| 201805|    6|
    |     2| 201801|   32|
    |     2| 201801|    1|
    |     2| 201802|   10|
    |     2| 201802|   18|
    |     2| 201803|   91|
    |     2| 201804|    6|
    |     2| 201804|    4|
    |     2| 201805|   31|
    +------+-------+-----+

    scala> hcon.sql("select custid,b.monthid,sum(times),max(times) from gamedw.visists a inner join (select distinct monthid from gamedw.visists) b on a.monthid<=b.monthid group by custid,b.monthid order by custid,b.monthid").show
    +------+-------+----------+----------+
    |custid|monthid|sum(times)|max(times)|
    +------+-------+----------+----------+
    |     1| 201801|        35|        25|
    |     1| 201802|        77|        35|
    |     1| 201803|       129|        52|
    |     1| 201804|       129|        52|
    |     1| 201805|       135|        52|
    |     2| 201801|        33|        32|
    |     2| 201802|        61|        32|
    |     2| 201803|       152|        91|
    |     2| 201804|       162|        91|
    |     2| 201805|       193|        91|
    +------+-------+----------+----------+

    关联的时候小表写在左边

  • 相关阅读:
    LeetCode 32. 最长有效括号(Longest Valid Parentheses)
    LeetCode 141. 环形链表(Linked List Cycle)
    LeetCode 160. 相交链表(Intersection of Two Linked Lists)
    LeetCode 112. 路径总和(Path Sum)
    LeetCode 124. 二叉树中的最大路径和(Binary Tree Maximum Path Sum)
    LightGBM新特性总结
    sql service 事务与锁
    C#泛型实例详解
    C# 中的委托和事件(详解)
    C# DateTime日期格式化
  • 原文地址:https://www.cnblogs.com/playforever/p/9605229.html
Copyright © 2011-2022 走看看