zoukankan      html  css  js  c++  java
  • 【Hive】HiveQL实战之操作符和函数

    Hive中的操作符合函数,和关系型数据库的类似,本篇主要讲解Hive的一些函数。

    一 函数分类

    Hive中的函数可以分为以下几种:
    • 数学函数:主要用于数学运算,例如:Randy()和E();
    • 集合函数:主要用于查找Size、Keys和复杂类型的值,例如:Size(Array<T>);
    • 类型转换函数:主要是Cast和Binary,用于将一种类型转为另一种类型;
    • 日期函数:用于执行与日期相关的操作,例如:Year(string date)和Month(string data);
    • 条件函数:主要使用返回值来检查特定的条件,例如:Coalesce、IF和CASE WHEN;
    • 字符函数:主要用于字符相关的操作,例如:Upper(string A)和Trim(string A);
    • 聚合函数:主要用于聚合操作,例如:Sum()、Count(*);
    • 表生成函数:主要用于将单个输入转换为多行输出,例如:Explode(Map)和Json_tuple(就送String ,k1,k2……);
    • 自定义函数:由Java代码创建,作为Hive的函数的扩展。
    二 函数查看

    在Hive中,可以使用如下方法查看Hive支持的函数,以及函数的使用:

    1 查看Hive中的函数
    0: jdbc:hive2://localhost:10000/hive> show functions;
    2 查看函数的使用
    0: jdbc:hive2://localhost:10000/hive> desc function upper;
    +----------------------------------------------------+
    |                      tab_name                      |
    +----------------------------------------------------+
    | upper(str) - Returns str with all characters changed to uppercase |
    +----------------------------------------------------+
    1 row selected (0.254 seconds)
    0: jdbc:hive2://localhost:10000/hive> desc function extended upper;
    +----------------------------------------------------+
    |                      tab_name                      |
    +----------------------------------------------------+
    | upper(str) - Returns str with all characters changed to uppercase |
    | Synonyms: ucase                                    |
    | Example:                                           |
    |   > SELECT upper('Facebook') FROM src LIMIT 1;     |
    |   'FACEBOOK'                                       |
    | Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper |
    | Function type:BUILTIN                              |
    +----------------------------------------------------+
    7 rows selected (0.22 seconds)
    注:后者查看的内容更详细。

    三 函数示例

    1 示例数据
    0: jdbc:hive2://localhost:10000/hive> select *from employee;
    +----------------+-------------------------+----------------------------+------------------------+----------------------------------------+
    | employee.name  |   employee.work_place   |      employee.sex_age      | employee.skills_score  |         employee.depart_title          |
    +----------------+-------------------------+----------------------------+------------------------+----------------------------------------+
    | Michael        | ["Montreal","Toronto"]  | {"sex":"Male","age":30}    | {"DB":80}              | {"Product":["Developer","Lead"]}       |
    | Will           | ["Montreal"]            | {"sex":"Male","age":35}    | {"Perl":85}            | {"Product":["Lead"],"Test":["Lead"]}   |
    | Shelley        | ["New York"]            | {"sex":"Female","age":27}  | {"Python":80}          | {"Test":["Lead"],"COE":["Architect"]}  |
    | Lucy           | ["Vancouver"]           | {"sex":"Female","age":57}  | {"Sales":89,"HR":94}   | {"Sales":["Lead"]}                     |
    +----------------+-------------------------+----------------------------+------------------------+----------------------------------------+
    4 rows selected (0.353 seconds)
    2 复杂数据类型

    1)SIZE

    该函数主要用于计算Map、Array或者嵌套Map/Array的大小,如果大小未知,则返回-1.
    0: jdbc:hive2://localhost:10000/hive> select size(work_place) as array_size,
    . . . . . . . . . . . . . . . . . . > size(skills_score) as map_size,
    . . . . . . . . . . . . . . . . . . > size(depart_title) as complex_size,
    . . . . . . . . . . . . . . . . . . > size(depart_title['Product']) as nest_size
    . . . . . . . . . . . . . . . . . . > from employee;
    +-------------+-----------+---------------+------------+
    | array_size  | map_size  | complex_size  | nest_size  |
    +-------------+-----------+---------------+------------+
    | 2           | 1         | 1             | 2          |
    | 1           | 1         | 2             | 1          |
    | 1           | 1         | 2             | -1         |
    | 1           | 2         | 1             | -1         |
    +-------------+-----------+---------------+------------+
    4 rows selected (0.38 seconds)
    2)ARRAY_CONTAINS / SORT_ARRAY

    该函数用于检查一个数组是否包含某一个值,如果包含,则返回True,否则返回FALSE,而SORT_ARRAY用于对数组进行升序排序。
    0: jdbc:hive2://localhost:10000/hive> select name,array_contains(work_place,'Toronto') as is_Toronto,
    . . . . . . . . . . . . . . . . . . > sort_array(work_place) as sort_array
    . . . . . . . . . . . . . . . . . . > from employee;
    +----------+-------------+-------------------------+
    |   name   | is_toronto  |       sort_array        |
    +----------+-------------+-------------------------+
    | Michael  | true        | ["Montreal","Toronto"]  |
    | Will     | false       | ["Montreal"]            |
    | Shelley  | false       | ["New York"]            |
    | Lucy     | false       | ["Vancouver"]           |
    +----------+-------------+-------------------------+
    4 rows selected (0.547 seconds)
    3 日期函数

    1)From_unixtime(unix_timestamp())

    该函数类似于Oracle中的SYSDATE()函数,用于动态返回当前时间。
    0: jdbc:hive2://localhost:10000/hive> select from_unixtime(unix_timestamp()) as current_time
    . . . . . . . . . . . . . . . . . . > from employee limit 1;
    +----------------------+
    |     current_time     |
    +----------------------+
    | 2018-07-09 11:04:31  |
    +----------------------+
    1 row selected (0.473 seconds)
    2)unix_timestamp

    该函数可用于比较两个日期,或者可以用于Order by后进行排序。
    0: jdbc:hive2://localhost:10000/hive> select from_unixtime(unix_timestamp()),(unix_timestamp()-unix_timestamp('2018-07-09 7:00:00'))/60/60/24 as daydiff from employee limit 1;
    +----------------------+----------------------+
    |         _c0          |       daydiff        |
    +----------------------+----------------------+
    | 2018-07-09 11:12:06  | 0.17506944444444442  |
    +----------------------+----------------------+
    1 row selected (0.905 seconds)
    3)To_date

    该函数用于返回年月日格式的日期。
    0: jdbc:hive2://localhost:10000/hive> select to_date(from_unixtime(unix_timestamp())) as day from employee limit 1;
    +-------------+
    |     day     |
    +-------------+
    | 2018-07-09  |
    +-------------+
    1 row selected (0.485 seconds)
    4 CASE
    0: jdbc:hive2://localhost:10000/hive> select 
    . . . . . . . . . . . . . . . . . . > case when 1 is null then 'True' else 0 end as case_result 
    . . . . . . . . . . . . . . . . . . > from employee limit 1;
    +--------------+
    | case_result  |
    +--------------+
    | 0            |
    +--------------+
    1 row selected (0.37 seconds)
    5 解析器和搜索

    1)数据准备
    0: jdbc:hive2://localhost:10000/hive> insert into employee
    . . . . . . . . . . . . . . . . . . > select 'Steven' as name,array(null) as work_place,
    . . . . . . . . . . . . . . . . . . > named_struct('sex','Male','age',30) as sex_age,
    . . . . . . . . . . . . . . . . . . > map('Python',90) as skills_score,
    . . . . . . . . . . . . . . . . . . > map('R&D',array('Developer')) as depart_title
    . . . . . . . . . . . . . . . . . . > from employee limit 1;
    No rows affected (194.183 seconds)
    0: jdbc:hive2://localhost:10000/hive> select *from employee;
    +----------------+-------------------------+----------------------------+------------------------+----------------------------------------+
    | employee.name  |   employee.work_place   |      employee.sex_age      | employee.skills_score  |         employee.depart_title          |
    +----------------+-------------------------+----------------------------+------------------------+----------------------------------------+
    | Steven         | NULL                    | {"sex":"Male","age":30}    | {"Python":90}          | {"R&D":["Developer"]}                  |
    | Michael        | ["Montreal","Toronto"]  | {"sex":"Male","age":30}    | {"DB":80}              | {"Product":["Developer","Lead"]}       |
    | Will           | ["Montreal"]            | {"sex":"Male","age":35}    | {"Perl":85}            | {"Product":["Lead"],"Test":["Lead"]}   |
    | Shelley        | ["New York"]            | {"sex":"Female","age":27}  | {"Python":80}          | {"Test":["Lead"],"COE":["Architect"]}  |
    | Lucy           | ["Vancouver"]           | {"sex":"Female","age":57}  | {"Sales":89,"HR":94}   | {"Sales":["Lead"]}                     |
    +----------------+-------------------------+----------------------------+------------------------+----------------------------------------+
    5 rows selected (0.786 seconds)
    2)Lateral View

    该函数会忽视Expode返回NULL的行。
    0: jdbc:hive2://localhost:10000/hive> select name,workplace,skills,score
    . . . . . . . . . . . . . . . . . . > from employee
    . . . . . . . . . . . . . . . . . . > lateral view explode(work_place) wp as workplace
    . . . . . . . . . . . . . . . . . . > lateral view explode(skills_score) ss as skills,score;
    +----------+------------+---------+--------+
    |   name   | workplace  | skills  | score  |
    +----------+------------+---------+--------+
    | Michael  | Montreal   | DB      | 80     |
    | Michael  | Toronto    | DB      | 80     |
    | Will     | Montreal   | Perl    | 85     |
    | Shelley  | New York   | Python  | 80     |
    | Lucy     | Vancouver  | Sales   | 89     |
    | Lucy     | Vancouver  | HR      | 94     |
    +----------+------------+---------+--------+
    6 rows selected (0.459 seconds)
    3)Outer Lateral View

    该函数会可以Expode返回NULL的行。
    0: jdbc:hive2://localhost:10000/hive> select name,workplace,skills,score                    
    . . . . . . . . . . . . . . . . . . > from employee                                         
    . . . . . . . . . . . . . . . . . . > lateral view outer explode(work_place) wp as workplace
    . . . . . . . . . . . . . . . . . . > lateral view explode(skills_score) ss as skills,score;
    +----------+------------+---------+--------+
    |   name   | workplace  | skills  | score  |
    +----------+------------+---------+--------+
    | Steven   | NULL       | Python  | 90     |
    | Michael  | Montreal   | DB      | 80     |
    | Michael  | Toronto    | DB      | 80     |
    | Will     | Montreal   | Perl    | 85     |
    | Shelley  | New York   | Python  | 80     |
    | Lucy     | Vancouver  | Sales   | 89     |
    | Lucy     | Vancouver  | HR      | 94     |
    +----------+------------+---------+--------+
    7 rows selected (0.366 seconds)
    4)Split

    该函数用于将字符串拆分为数组。
    0: jdbc:hive2://localhost:10000/hive> select split('/user/hive/warehouse/hive.db/employee','/') from employee limit 1;
    +----------------------------------------------------+
    |                        _c0                         |
    +----------------------------------------------------+
    | ["","user","hive","warehouse","hive.db","employee"] |
    +----------------------------------------------------+
    1 row selected (0.303 seconds)
    5)Reverse

    该函数用于反转字符。
    0: jdbc:hive2://localhost:10000/hive> select reverse(split(reverse('/user/hive/warehouse/hive.db/employee'),'/')[0]) from employee limit 1;
    +-----------+
    |    _c0    |
    +-----------+
    | employee  |
    +-----------+
    1 row selected (0.392 seconds)
    6)Collect_set

    该函数用于合并为一行,但会去掉重复的值。
    0: jdbc:hive2://localhost:10000/hive> select collect_set(work_place[0]) from employee;
    +--------------------------------------+
    |                 _c0                  |
    +--------------------------------------+
    | ["Montreal","New York","Vancouver"]  |
    +--------------------------------------+
    1 row selected (140.198 seconds)
    7)Collect_list

    该函数和Collect_set类似,但不会去重。
    0: jdbc:hive2://localhost:10000/hive> select collect_list(work_place[0]) from employee;
    0: jdbc:hive2://localhost:10000/hive> select collect_list(work_place[0]) from employee;
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
    +-------------------------------------------------+
    |                       _c0                       |
    +-------------------------------------------------+
    | ["Montreal","Montreal","New York","Vancouver"]  |
    +-------------------------------------------------+
    1 row selected (133.142 seconds)
    更多内容,可参考Hive函数(Show Functions)。

  • 相关阅读:
    [转]MySql 5.7关键字和保留字-附表
    [转]超链接标签简单的几个样式属性
    layui table 前台数字格式保留两位小数,不足补0(mysql 数据库)
    [转]Object.keys()和for in的排序问题
    [转]对form:input标签中的数字进行格式化
    [转]bootstrap table 动态列数
    [转]bootstrap的table插件动态加载表头
    [转]【MyBatis】Decimal映射到实体类出现科学计数法问题
    [转]MySQL函数大全 及用法示例
    [转]mysql update case when和where之间的注意事项
  • 原文地址:https://www.cnblogs.com/alen-liu-sz/p/12975623.html
Copyright © 2011-2022 走看看