Subqueries in the FROM Clause
SELECT ... FROM (subquery) name ... SELECT ... FROM (subquery) AS name ... (Note: Only valid starting with Hive 0.13.0) |
Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names. The columns in the subquery select list are available in the outer query just like columns of a table. The subquery can also be a query expression with UNION. Hive supports arbitrary levels of subqueries.
The optional keyword "AS" can be included before the subquery name in Hive 0.13.0 and later versions (HIVE-6519).
Example with simple subquery:
SELECT col FROM ( SELECT a+b AS col FROM t1 ) t2 |
Example with subquery containing a UNION ALL:
SELECT t3.col FROM ( SELECT a+b AS col FROM t1 UNION ALL SELECT c+d AS col FROM t2 ) t3 |
Subqueries in the WHERE Clause
As of Hive 0.13 some types of subqueries are supported in the WHERE clause. Those are queries where the result of the query can be treated as a constant for IN and NOT IN statements (called uncorrelated subqueries because the subquery does not reference columns from the parent query):
SELECT * FROM A WHERE A.a IN ( SELECT foo FROM B); |
The other supported types are EXISTS and NOT EXISTS subqueries:
SELECT A FROM T1 WHERE EXISTS ( SELECT B FROM T2 WHERE T1.X = T2.Y) |
There are a few limitations:
- These subqueries are only supported on the right-hand side of an expression.
- IN/NOT IN subqueries may only select a single column.
- EXISTS/NOT EXISTS must have one or more correlated predicates.
- References to the parent query are only supported in the WHERE clause of the subquery.