zoukankan html css js c++ java

Hive Tutorial（上）（Hive 入门指导）

用户指导

Hive 指导

Hive指导
概念

Hive是什么
Hive不是什么
获得和开始
数据单元
类型系统
内置操作符和方法
语言性能

用法和例子（在<下>里面）

概念

Hive是什么？

Hive是一个以Apache Hadoop为基础的数据仓储基础设施。Hadoop为数据的存储和运行在商业机器上提供了可扩展和高容错的性能。

Hive的设计目标是使得数据汇总更加简单和针对大容量数据的查询和分析。它提供SWL来使得用户可以更简单地查询、汇总和数据分析。同时，Hive的SQL为用户提供了多种地方来融合他们自己的方法实现自定义分析，例如User Defined Functions (UDFs)。

Hive不是什么？

Hive不是为事务联机处理设计的。它是用于处理传统数据仓储任务。

获得和开始

至于如何配置Hive，HiveServer2和Beeline的细节，请参考GettingStarted指南。

Books about Hive 展示了一些可以帮助更好开始Hive的书籍。

在接下来的部分我们将提供一份关于系统性能的指导。我们开始描述data types，tables和partitions（跟传统关系型数据库相似）的概念和通过举例帮助了解Hive的能力。

数据单元

为了使得粒度合适，Hive数据采用下面展示的组织结构：

Databases:命名空间方法用来避免tables，views，partitions，columns等等的命名冲突。Databases也可以用于加强用户或者一组用户的安全性。
Tables: 拥有相同的schema被看成是同种数据单元。下面是page_views表的例子，每一行包括以下的列（schema）：

timestamp—当网页被浏览时UNIX timestamp一致的INT类型的数据
userid —用来识别浏览该页面的用户的BIGINT类型的数据
page_url —获取网页位置的STRING类型的数据
referer_url—用于获取用户所在当前页面的位置的STRING类型的数据。
IP—用于获取页面请求时的IP地址。

Partitions:每个页面都拥有一个或者多个Key来决定数据如何存储。Partitions——除了存储单元——也允许用户高效地识别满足指定标准的行。例如，STRING类型的 data_partition和STRING 类型的country_partition。每一个独一无二的partition key值定义一个Table的partition。例如，US时间的“2009-12-23”是page_views table的一个partition。因此，如果你只想分析2009-12-23的“US”数据，你可以运行在该table相关的partition上，从而提高分析效率。然而需要说明的是，只是因为有一个partition名字为2009-12-23并不意味着它包含所有或者只是该日期的数据。partition用时间命名只是为了方面。维持partition名字和数据内容之间的映射关系是用户的工作。Partition列是虚拟列，本身不是数据的一部分而是在加载中派生出来的。
Buckets (or Clusters):基于对表中一些列的hash方法得出值来将每个partition中的数据分到不同的Buckets中。例如，page_views表可能通过userid表中不同于其他partition列的列来bucket。这可以用于有效地获取样本。

需要说明是对于表来说partitioned和bucketed不是必需的，但这些抽象化概念允许系统在查询操作中筛选掉大量数据来提高查询速度。

类型系统

Hive支持原始和复杂数据类型，正如下面多描述的。可以在Hive Data Types中查看更多信息。

原始类型

数据类型是跟表中的列相关的，支持下面的原始类型：
Integers

TINYINT—1个字节的整型
SMALLINT—2个字节的整型
INT—4个字节的整型
BIGINT—8个字节的整型

Boolean type

BOOLEAN—TRUE/FALSE

Floating point numbers

FLOAT—单精度
DOUBLE—双精度

Fixed point numbers

DECIMAL—a fixed point value of user defined scale and precision用户定义的大小和精度的固定的点值

String types

STRING—指定字符集的字符串
VARCHAR—指定字符集最大长度的字符串
CHAR—指定字符集和长度的字符串

Date and time types

TIMESTAMP— 纳秒精度的特定时间点
DATE—日期

Binary types

BINARY—字节序列

类型的层次结构如下（父类是所有子类实例的超类型）：

Type

Primitive Type

Number

DOUBLE

FLOAT

BIGINT

SMALLINT

TINYINT

STRING

BOOLEAN

类型层级定义了类型在查询语言中的隐性转换。隐性转换允许子类转换成父类。所以当一个查询表达式需要type1但是数据是type2,type1在层级结构中是type2的父类，那么type2可以转换成type1.需要说明的是类型层级允许STRING转换成DOUBLE。

明确的类型转换可以用下面部分#Built In Functions中的cast操作符来实现。

复杂类型

复杂类型可以用原始类型和其他组合类型来组合：

Structs:类型里面的元素可以用.符号来获得。举个例子，一个列c的类型是STRUCT{a INT;b INT}，那么里面的a可用c.a来访问。
Maps (key-value tuples): 元素可用[‘元素名’]来访问。例如，在一个map M中包含一个一个键值对‘group’->gid,那么gid的值可以用M[‘group’]来获得。
Arrays (indexable lists): 数组中元素必须是同种类型。元素可以通过[index]来获得。举个例子，A数组拥有元素[‘a’,’b’,’c’]，那么A[1]会返回‘b’。

使用原始数据类型和创造复杂类型的架构，任意级别的嵌套类型都可以被创造。例如，对于一个类型，用户可能包含下面的字段：

gender—which is a STRING.
active—which is a BOOLEAN.

内置操作符和方法

下面列出的操作符和方法不一定是最新的（Hive Operators and UDFs里面有更多最新信息）在 Beeline 或者 Hive CLI, 使用这些命令行获得最新文档:

SHOW FUNCTIONS;

DESCRIBE FUNCTION <function_name>;

DESCRIBE FUNCTION EXTENDED <function_name>;

区分大小写

所有的Hive关键词都是区分大小写，包括Hive操作和方法名。

内置操作符

关系操作符—通过与传递进来的值进行比较返回TRUE or FALSE

Relational Operator	Operand types	Description
A = B	all primitive types	TRUE if expression A is equivalent to expression B; otherwise FALSE
A != B	all primitive types	TRUE if expression A is not equivalent to expression B; otherwise FALSE
A < B	all primitive types	TRUE if expression A is less than expression B; otherwise FALSE
A <= B	all primitive types	TRUE if expression A is less than or equal to expression B; otherwise FALSE
A > B	all primitive types	TRUE if expression A is greater than expression B] otherwise FALSE
A >= B	all primitive types	TRUE if expression A is greater than or equal to expression B otherwise FALSE
A IS NULL	all types	TRUE if expression A evaluates to NULL otherwise FALSE
A IS NOT NULL	all types	FALSE if expression A evaluates to NULL otherwise TRUE
A LIKE B	strings	TRUE if string A matches the SQL simple regular expression B, otherwise FALSE. The comparison is done character by character. The _ character in B matches any character in A (similar to . in posix regular expressions), and the % character in B matches an arbitrary number of characters in A (similar to .* in posix regular expressions). For example, `'foobar' LIKE 'foo'` evaluates to FALSE where as `'foobar' LIKE 'foo___'` evaluates to TRUE and so does `'foobar' LIKE 'foo%'`. To escape % use (% matches one % character). If the data contains a semicolon, and you want to search for it, it needs to be escaped, `columnValue LIKE 'a;b'`
A RLIKE B	strings	NULL if A or B is NULL, TRUE if any (possibly empty) substring of A matches the Java regular expression B (see Java regular expressions syntax), otherwise FALSE. For example, 'foobar' rlike 'foo' evaluates to TRUE and so does 'foobar' rlike '^f.*r$'.
A REGEXP B	strings	Same as RLIKE

算术操作符—下面的运算符支持各种常见的算术运算符。所有的返回值都为number类型的。

Arithmetic Operators	Operand types	Description
A + B	all number types	Gives the result of adding A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands, for example, since every integer is a float. Therefore, float is a containing type of integer so the + operator on a float and an int will result in a float.
A - B	all number types	Gives the result of subtracting B from A. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A * B	all number types	Gives the result of multiplying A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. Note that if the multiplication causing overflow, you will have to cast one of the operators to a type higher in the type hierarchy.
A / B	all number types	Gives the result of dividing B from A. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. If the operands are integer types, then the result is the quotient of the division.
A % B	all number types	Gives the reminder resulting from dividing A by B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A & B	all number types	Gives the result of bitwise AND of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A \| B	all number types	Gives the result of bitwise OR of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A ^ B	all number types	Gives the result of bitwise XOR of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
~A	all number types	Gives the result of bitwise NOT of A. The type of the result is the same as the type of A.

逻辑运算符 — 下面的操作符支持创建逻辑表达式。所有的结果根据操作对象的布尔值来返回true/false。

Logical Operators	Operands types	Description
A AND B	boolean	TRUE if both A and B are TRUE, otherwise FALSE
A && B	boolean	Same as A AND B
A OR B	boolean	TRUE if either A or B or both are TRUE, otherwise FALSE
A \|\| B	boolean	Same as A OR B
NOT A	boolean	TRUE if A is FALSE, otherwise FALSE
!A	boolean	Same as NOT A

复杂类型中的操作符—下面的操作符提供机制来获得复杂类型中的元素，

Operator	Operand types	Description
A[n]	A is an Array and n is an int	returns the nth element in the array A. The first element has index 0, for example, if A is an array comprising of ['foo', 'bar'] then A[0] returns 'foo' and A[1] returns 'bar'
M[key]	M is a Map<K, V> and key has type K	returns the value corresponding to the key in the map for example, if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar'
S.x	S is a struct	returns the x field of S, for example, for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct.

内置方法

Hive 支持以下方法:
(Function list in source code: FunctionRegistry.java)

Return Type	Function Name (Signature)	Description
BIGINT	round(double a)	returns the rounded BIGINT value of the double
BIGINT	floor(double a)	returns the maximum BIGINT value that is equal or less than the double
BIGINT	ceil(double a)	returns the minimum BIGINT value that is equal or greater than the double
double	rand(), rand(int seed)	returns a random number (that changes from row to row). Specifiying the seed will make sure the generated random number sequence is deterministic.
string	concat(string A, string B,...)	returns the string resulting from concatenating B after A. For example, concat('foo', 'bar') results in 'foobar'. This function accepts arbitrary number of arguments and return the concatenation of all of them.
string	substr(string A, int start)	returns the substring of A starting from start position till the end of string A. For example, substr('foobar', 4) results in 'bar'
string	substr(string A, int start, int length)	returns the substring of A starting from start position with the given length, for example, substr('foobar', 4, 2) results in 'ba'
string	upper(string A)	returns the string resulting from converting all characters of A to upper case, for example, upper('fOoBaR') results in 'FOOBAR'
string	ucase(string A)	Same as upper
string	lower(string A)	returns the string resulting from converting all characters of B to lower case, for example, lower('fOoBaR') results in 'foobar'
string	lcase(string A)	Same as lower
string	trim(string A)	returns the string resulting from trimming spaces from both ends of A, for example, trim(' foobar ') results in 'foobar'
string	ltrim(string A)	returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(' foobar ') results in 'foobar '
string	rtrim(string A)	returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(' foobar ') results in ' foobar'
string	regexp_replace(string A, string B, string C)	returns the string resulting from replacing all substrings in B that match the Java regular expression syntax(See Java regular expressions syntax) with C. For example, regexp_replace('foobar', 'oo\|ar', ) returns 'fb'
int	size(Map<K.V>)	returns the number of elements in the map type
int	size(Array<T>)	returns the number of elements in the array type
value of <type>	cast(<expr> as <type>)	converts the results of the expression expr to <type>, for example, cast('1' as BIGINT) will convert the string '1' to it integral representation. A null is returned if the conversion does not succeed.
string	from_unixtime(int unixtime)	convert the number of seconds from the UNIX epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00"
string	to_date(string timestamp)	Return the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"
int	year(string date)	Return the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970
int	month(string date)	Return the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11
int	day(string date)	Return the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1
string	get_json_object(string json_string, string path)	Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid.

Hive中的统计函数:

Return Type	Aggregation Function Name (Signature)	Description
BIGINT	count(*), count(expr), count(DISTINCT expr[, expr_.])	count(*)—Returns the total number of retrieved rows, including rows containing NULL values; count(expr)—Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr])—Returns the number of rows for which the supplied expression(s) are unique and non-NULL.
DOUBLE	sum(col), sum(DISTINCT col)	returns the sum of the elements in the group or the sum of the distinct values of the column in the group
DOUBLE	avg(col), avg(DISTINCT col)	returns the average of the elements in the group or the average of the distinct values of the column in the group
DOUBLE	min(col)	returns the minimum value of the column in the group
DOUBLE	max(col)	returns the maximum value of the column in the group

语言能力

Hive's SQL 提供基础 SQL操作. 这些操作是用在表和partition上，这些操作是:

用WHERE来过滤表中的行
用SELECT从表中来选择确定的行
两表联合
Ability to evaluate aggregations on multiple "group by" columns for the data stored in a table.（翻译不出来T T）
将查询的结果存储到另一个表中
将表中内容下载到本地目录
将查询结果存储到hadoop dfs目录
管理tables和partitions（新建，移除和更改）
插入使用自定义 map/reduce作业使用的语言写的自定义脚本

下面是原文

Tutorial

Hive Tutorial

Hive Tutorial
Concepts

What Is Hive
What Hive Is NOT
Getting Started
Data Units
Type System
Built In Operators and Functions
Language Capabilities

Usage and Examples

Concepts

What Is Hive

Hive is a data warehousing infrastructure based on Apache Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing on commodity hardware.

Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides SQL which enables users to do ad-hoc querying, summarization and data analysis easily. At the same time, Hive's SQL gives users multiple places to integrate their own functionality to do custom analysis, such as User Defined Functions (UDFs).

What Hive Is NOT

Hive is not designed for online transaction processing. It is best used for traditional data warehousing tasks.

Getting Started

For details on setting up Hive, HiveServer2, and Beeline, please refer to the GettingStarted guide.

Books about Hive lists some books that may also be helpful for getting started with Hive.

In the following sections we provide a tutorial on the capabilities of the system. We start by describing the concepts of data types, tables, and partitions (which are very similar to what you would find in a traditional relational DBMS) and then illustrate the capabilities of Hive with the help of some examples.

Data Units

In the order of granularity - Hive data is organized into:

Databases: Namespaces function to avoid naming conflicts for tables, views, partitions, columns, and so on. Databases can also be used to enforce security for a user or group of users.
Tables: Homogeneous units of data which have the same schema. An example of a table could be page_views table, where each row could comprise of the following columns (schema):

timestamp—which is of INT type that corresponds to a UNIX timestamp of when the page was viewed.
userid —which is of BIGINT type that identifies the user who viewed the page.
page_url—which is of STRING type that captures the location of the page.
referer_url—which is of STRING that captures the location of the page from where the user arrived at the current page.
IP—which is of STRING type that captures the IP address from where the page request was made.

Partitions: Each Table can have one or more partition Keys which determines how the data is stored. Partitions—apart from being storage units—also allow the user to efficiently identify the rows that satisfy a specified criteria; for example, a date_partition of type STRING and country_partition of type STRING. Each unique value of the partition keys defines a partition of the Table. For example, all "US" data from "2009-12-23" is a partition of the page_views table. Therefore, if you run analysis on only the "US" data for 2009-12-23, you can run that query only on the relevant partition of the table, thereby speeding up the analysis significantly. Note however, that just because a partition is named 2009-12-23 does not mean that it contains all or only data from that date; partitions are named after dates for convenience; it is the user's job to guarantee the relationship between partition name and data content! Partition columns are virtual columns, they are not part of the data itself but are derived on load.
Buckets (or Clusters): Data in each partition may in turn be divided into Buckets based on the value of a hash function of some column of the Table. For example the page_views table may be bucketed by userid, which is one of the columns, other than the partitions columns, of the page_view table. These can be used to efficiently sample the data.

Note that it is not necessary for tables to be partitioned or bucketed, but these abstractions allow the system to prune large quantities of data during query processing, resulting in faster query execution.

Type System

Hive supports primitive and complex data types, as described below. See Hive Data Types for additional information.

Primitive Types

Types are associated with the columns in the tables. The following Primitive types are supported:
Integers

TINYINT—1 byte integer
SMALLINT—2 byte integer
INT—4 byte integer
BIGINT—8 byte integer

Boolean type

BOOLEAN—TRUE/FALSE

Floating point numbers

FLOAT—single precision
DOUBLE—Double precision

Fixed point numbers

DECIMAL—a fixed point value of user defined scale and precision

String types

STRING—sequence of characters in a specified character set
VARCHAR—sequence of characters in a specified character set with a maximum length
CHAR—sequence of characters in a specified character set with a defined length

Date and time types

TIMESTAMP— a specific point in time, up to nanosecond precision
DATE—a date

Binary types

BINARY—a sequence of bytes

The Types are organized in the following hierarchy (where the parent is a super type of all the children instances):

Type

Primitive Type

Number

DOUBLE

FLOAT

BIGINT

SMALLINT

TINYINT

STRING

BOOLEAN

This type hierarchy defines how the types are implicitly converted in the query language. Implicit conversion is allowed for types from child to an ancestor. So when a query expression expects type1 and the data is of type2, type2 is implicitly converted to type1 if type1 is an ancestor of type2 in the type hierarchy. Note that the type hierarchy allows the implicit conversion of STRING to DOUBLE.

Explicit type conversion can be done using the cast operator as shown in the #Built In Functions section below.

Complex Types

Complex Types can be built up from primitive types and other composite types using:

Structs: the elements within the type can be accessed using the DOT (.) notation. For example, for a column c of type STRUCT {a INT; b INT}, the a field is accessed by the expression c.a
Maps (key-value tuples): The elements are accessed using ['element name'] notation. For example in a map M comprising of a mapping from 'group' -> gid the gid value can be accessed using M['group']
Arrays (indexable lists): The elements in the array have to be in the same type. Elements can be accessed using the [n] notation where n is an index (zero-based) into the array. For example, for an array A having the elements ['a', 'b', 'c'], A[1] retruns 'b'.

Using the primitive types and the constructs for creating complex types, types with arbitrary levels of nesting can be created. For example, a type User may comprise of the following fields:

gender—which is a STRING.
active—which is a BOOLEAN.

Built In Operators and Functions

The operators and functions listed below are not necessarily up to date. (Hive Operators and UDFs has more current information.) In Beeline or the Hive CLI, use these commands to show the latest documentation:

SHOW FUNCTIONS;

DESCRIBE FUNCTION <function_name>;

DESCRIBE FUNCTION EXTENDED <function_name>;

Case-insensitive

All Hive keywords are case-insensitive, including the names of Hive operators and functions.

Built In Operators

Relational Operators—The following operators compare the passed operands and generate a TRUE or FALSE value, depending on whether the comparison between the operands holds or not.

Relational Operator	Operand types	Description
A = B	all primitive types	TRUE if expression A is equivalent to expression B; otherwise FALSE
A != B	all primitive types	TRUE if expression A is not equivalent to expression B; otherwise FALSE
A < B	all primitive types	TRUE if expression A is less than expression B; otherwise FALSE
A <= B	all primitive types	TRUE if expression A is less than or equal to expression B; otherwise FALSE
A > B	all primitive types	TRUE if expression A is greater than expression B] otherwise FALSE
A >= B	all primitive types	TRUE if expression A is greater than or equal to expression B otherwise FALSE
A IS NULL	all types	TRUE if expression A evaluates to NULL otherwise FALSE
A IS NOT NULL	all types	FALSE if expression A evaluates to NULL otherwise TRUE
A LIKE B	strings	TRUE if string A matches the SQL simple regular expression B, otherwise FALSE. The comparison is done character by character. The _ character in B matches any character in A (similar to . in posix regular expressions), and the % character in B matches an arbitrary number of characters in A (similar to .* in posix regular expressions). For example, `'foobar' LIKE 'foo'` evaluates to FALSE where as `'foobar' LIKE 'foo___'` evaluates to TRUE and so does `'foobar' LIKE 'foo%'`. To escape % use (% matches one % character). If the data contains a semicolon, and you want to search for it, it needs to be escaped, `columnValue LIKE 'a;b'`
A RLIKE B	strings	NULL if A or B is NULL, TRUE if any (possibly empty) substring of A matches the Java regular expression B (see Java regular expressions syntax), otherwise FALSE. For example, 'foobar' rlike 'foo' evaluates to TRUE and so does 'foobar' rlike '^f.*r$'.
A REGEXP B	strings	Same as RLIKE

Arithmetic Operators—The following operators support various common arithmetic operations on the operands. All of them return number types.

Arithmetic Operators	Operand types	Description
A + B	all number types	Gives the result of adding A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands, for example, since every integer is a float. Therefore, float is a containing type of integer so the + operator on a float and an int will result in a float.
A - B	all number types	Gives the result of subtracting B from A. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A * B	all number types	Gives the result of multiplying A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. Note that if the multiplication causing overflow, you will have to cast one of the operators to a type higher in the type hierarchy.
A / B	all number types	Gives the result of dividing B from A. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. If the operands are integer types, then the result is the quotient of the division.
A % B	all number types	Gives the reminder resulting from dividing A by B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A & B	all number types	Gives the result of bitwise AND of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A \| B	all number types	Gives the result of bitwise OR of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A ^ B	all number types	Gives the result of bitwise XOR of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
~A	all number types	Gives the result of bitwise NOT of A. The type of the result is the same as the type of A.

Logical Operators — The following operators provide support for creating logical expressions. All of them return boolean TRUE or FALSE depending upon the boolean values of the operands.

Logical Operators	Operands types	Description
A AND B	boolean	TRUE if both A and B are TRUE, otherwise FALSE
A && B	boolean	Same as A AND B
A OR B	boolean	TRUE if either A or B or both are TRUE, otherwise FALSE
A \|\| B	boolean	Same as A OR B
NOT A	boolean	TRUE if A is FALSE, otherwise FALSE
!A	boolean	Same as NOT A

Operators on Complex Types—The following operators provide mechanisms to access elements in Complex Types

Operator	Operand types	Description
A[n]	A is an Array and n is an int	returns the nth element in the array A. The first element has index 0, for example, if A is an array comprising of ['foo', 'bar'] then A[0] returns 'foo' and A[1] returns 'bar'
M[key]	M is a Map<K, V> and key has type K	returns the value corresponding to the key in the map for example, if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar'
S.x	S is a struct	returns the x field of S, for example, for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct.

Built In Functions

Hive supports the following built in functions:
(Function list in source code: FunctionRegistry.java)

Return Type	Function Name (Signature)	Description
BIGINT	round(double a)	returns the rounded BIGINT value of the double
BIGINT	floor(double a)	returns the maximum BIGINT value that is equal or less than the double
BIGINT	ceil(double a)	returns the minimum BIGINT value that is equal or greater than the double
double	rand(), rand(int seed)	returns a random number (that changes from row to row). Specifiying the seed will make sure the generated random number sequence is deterministic.
string	concat(string A, string B,...)	returns the string resulting from concatenating B after A. For example, concat('foo', 'bar') results in 'foobar'. This function accepts arbitrary number of arguments and return the concatenation of all of them.
string	substr(string A, int start)	returns the substring of A starting from start position till the end of string A. For example, substr('foobar', 4) results in 'bar'
string	substr(string A, int start, int length)	returns the substring of A starting from start position with the given length, for example, substr('foobar', 4, 2) results in 'ba'
string	upper(string A)	returns the string resulting from converting all characters of A to upper case, for example, upper('fOoBaR') results in 'FOOBAR'
string	ucase(string A)	Same as upper
string	lower(string A)	returns the string resulting from converting all characters of B to lower case, for example, lower('fOoBaR') results in 'foobar'
string	lcase(string A)	Same as lower
string	trim(string A)	returns the string resulting from trimming spaces from both ends of A, for example, trim(' foobar ') results in 'foobar'
string	ltrim(string A)	returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(' foobar ') results in 'foobar '
string	rtrim(string A)	returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(' foobar ') results in ' foobar'
string	regexp_replace(string A, string B, string C)	returns the string resulting from replacing all substrings in B that match the Java regular expression syntax(See Java regular expressions syntax) with C. For example, regexp_replace('foobar', 'oo\|ar', ) returns 'fb'
int	size(Map<K.V>)	returns the number of elements in the map type
int	size(Array<T>)	returns the number of elements in the array type
value of <type>	cast(<expr> as <type>)	converts the results of the expression expr to <type>, for example, cast('1' as BIGINT) will convert the string '1' to it integral representation. A null is returned if the conversion does not succeed.
string	from_unixtime(int unixtime)	convert the number of seconds from the UNIX epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00"
string	to_date(string timestamp)	Return the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"
int	year(string date)	Return the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970
int	month(string date)	Return the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11
int	day(string date)	Return the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1
string	get_json_object(string json_string, string path)	Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid.

The following built in aggregate functions are supported in Hive:

Return Type	Aggregation Function Name (Signature)	Description
BIGINT	count(*), count(expr), count(DISTINCT expr[, expr_.])	count(*)—Returns the total number of retrieved rows, including rows containing NULL values; count(expr)—Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr])—Returns the number of rows for which the supplied expression(s) are unique and non-NULL.
DOUBLE	sum(col), sum(DISTINCT col)	returns the sum of the elements in the group or the sum of the distinct values of the column in the group
DOUBLE	avg(col), avg(DISTINCT col)	returns the average of the elements in the group or the average of the distinct values of the column in the group
DOUBLE	min(col)	returns the minimum value of the column in the group
DOUBLE	max(col)	returns the maximum value of the column in the group

Language Capabilities

Hive's SQL provides the basic SQL operations. These operations work on tables or partitions. These operations are:

Ability to filter rows from a table using a WHERE clause.
Ability to select certain columns from the table using a SELECT clause.
Ability to do equi-joins between two tables.
Ability to evaluate aggregations on multiple "group by" columns for the data stored in a table.
Ability to store the results of a query into another table.
Ability to download the contents of a table to a local (for example,, nfs) directory.
Ability to store the results of a query in a hadoop dfs directory.
Ability to manage tables and partitions (create, drop and alter).
Ability to plug in custom scripts in the language of choice for custom map/reduce jobs.

查看全文

相关阅读:
openlayers5之ol.proj坐标转换
 bootstrap模态框可拖动 --------------------- 本文来自 gisdoer 的CSDN 博客，全文地址请点击：https://blog.csdn.net/gisdoer/article/details/82838312?utm_source=copy
bootstrap模态框可拖动
 leaflet实用插件整理
 面试题经常问的闭包问题
 撩课-Java每天5道面试题第15天
 撩课-Web大前端每天5道面试题-Day4
撩课-Python-每天5道面试题-第4天
 撩课-Java每天5道面试题第14天
 撩课-Web大前端每天5道面试题-Day3

原文地址：https://www.cnblogs.com/simple-focus/p/6170030.html