zoukankan      html  css  js  c++  java
  • Hive初识(四)

    Hive本质上是一个数据仓库,但不存储数据(只存储元数据(metadata),Hive中的元数据包括表的名字,表的列和分区及分区及其属性,表的属性(是否为外部表等),表的数据所在目录等),用户可以借助Hive使用sql对存储在分布式文件系统中的大数据集进行读写

    Hive查询语言(HiveQL)是一种查询语言,Hive处理在Metastore(元数据存储)分析结构化数据。

    SELECT语句用来从表中检索的数据。WHERE子句中的工作原理类似于一个条件。它使用这个条件过滤数据,并返回给出一个有限的结果。

    语法:下面给出的SELECT查询的语法

    SELECT [ALL | DISTINCT] select_expr, select_expr, ... 
    FROM table_reference 
    [WHERE where_condition] 
    [GROUP BY col_list] 
    [HAVING having_condition] 
    [CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]] 
    [LIMIT number];

    示例

    举个例子SELECT...WHERE子句。假设employee表有如下Id,Name,Salary,Designation和Dept等字段,生成一个查询检索超过30000薪水的员工详细信息。

    +------+--------------+-------------+-------------------+--------+
    | ID   | Name         | Salary      | Designation       | Dept   |
    +------+--------------+-------------+-------------------+--------+
    |1201  | Gopal        | 45000       | Technical manager | TP     |
    |1202  | Manisha      | 45000       | Proofreader       | PR     |
    |1203  | Masthanvali  | 40000       | Technical writer  | TP     |
    |1204  | Krian        | 40000       | Hr Admin          | HR     |
    |1205  | Kranthi      | 30000       | Op Admin          | Admin  | 
    +------+--------------+-------------+-------------------+--------+

    下面的查询检索使用上述业务情景的员工详细信息:

    SELECT * FROM employee WHERE salary>30000;

    成功查询后,能看到以下回应:

    +------+--------------+-------------+-------------------+--------+
    | ID   | Name         | Salary      | Designation       | Dept   |
    +------+--------------+-------------+-------------------+--------+
    |1201  | Gopal        | 45000       | Technical manager | TP     |
    |1202  | Manisha      | 45000       | Proofreader       | PR     |
    |1203  | Masthanvali  | 40000       | Technical writer  | TP     |
    |1204  | Krian        | 40000       | Hr Admin          | HR     |
    +------+--------------+-------------+-------------------+--------+

    下面介绍使用SELECT语句的ORDER BY子句。

    示例:假设需要生成一个查询用于检索员工的详细信息。

    +------+--------------+-------------+-------------------+--------+
    | ID   | Name         | Salary      | Designation       | Dept   |
    +------+--------------+-------------+-------------------+--------+
    |1201  | Gopal        | 45000       | Technical manager | TP     |
    |1202  | Manisha      | 45000       | Proofreader       | PR     |
    |1203  | Masthanvali  | 40000       | Technical writer  | TP     |
    |1204  | Krian        | 40000       | Hr Admin          | HR     |
    |1205  | Kranthi      | 30000       | Op Admin          | Admin  |
    +------+--------------+-------------+-------------------+--------+

    下面是使用上述业务情景查询检索员工详细信息:

    SELECT * FROM employee ORDER BY DEPT;

    成功查询后能得到以下回应:

    +------+--------------+-------------+-------------------+--------+
    | ID   | Name         | Salary      | Designation       | Dept   |
    +------+--------------+-------------+-------------------+--------+
    |1205  | Kranthi      | 30000       | Op Admin          | Admin  |
    |1204  | Krian        | 40000       | Hr Admin          | HR     |
    |1202  | Manisha      | 45000       | Proofreader       | PR     |
    |1201  | Gopal        | 45000       | Technical manager | TP     |
    |1203  | Masthanvali  | 40000       | Technical writer  | TP     |
    +------+--------------+-------------+-------------------+--------+

    GROUP BY子句用于分类所有记录结果的特定集合列。它被用来查询一组激励。

    如果用来产生一个查询以检索每个部门的员工数量。

    +------+--------------+-------------+-------------------+--------+ 
    | ID   | Name         | Salary      | Designation       | Dept   |
    +------+--------------+-------------+-------------------+--------+ 
    |1201  | Gopal        | 45000       | Technical manager | TP     | 
    |1202  | Manisha      | 45000       | Proofreader       | PR     | 
    |1203  | Masthanvali  | 40000       | Technical writer  | TP     | 
    |1204  | Krian        | 45000       | Proofreader       | PR     | 
    |1205  | Kranthi      | 30000       | Op Admin          | Admin  |
    +------+--------------+-------------+-------------------+--------+ 

    下面使用上述业务情景查询检索员工的详细信息。

    SELECT Dept,count(*) FROM employee GROUP BY DEPT;

    返回结果为:

    +------+--------------+ 
    | Dept | Count(*)     | 
    +------+--------------+ 
    |Admin |    1         | 
    |PR    |    2         | 
    |TP    |    3         | 
    +------+--------------+


    JOIN是子句用于通过使用共同值组合来自两个表特定字段。它是用来从数据库中两个或更多的表组合的记录。它或多或少类似于SQL JOIN。

    示例:

    我们将使用下面两个表,CUSTOMERS表

    +----+----------+-----+-----------+----------+ 
    | ID | NAME     | AGE | ADDRESS   | SALARY   | 
    +----+----------+-----+-----------+----------+ 
    | 1  | Ramesh   | 32  | Ahmedabad | 2000.00  |  
    | 2  | Khilan   | 25  | Delhi     | 1500.00  |  
    | 3  | kaushik  | 23  | Kota      | 2000.00  | 
    | 4  | Chaitali | 25  | Mumbai    | 6500.00  | 
    | 5  | Hardik   | 27  | Bhopal    | 8500.00  | 
    | 6  | Komal    | 22  | MP        | 4500.00  | 
    | 7  | Muffy    | 24  | Indore    | 10000.00 | 
    +----+----------+-----+-----------+----------+

    ORDERS表

    +-----+---------------------+-------------+--------+ 
    |OID  | DATE                | CUSTOMER_ID | AMOUNT | 
    +-----+---------------------+-------------+--------+ 
    | 102 | 2009-10-08 00:00:00 |           3 | 3000   | 
    | 100 | 2009-10-08 00:00:00 |           3 | 1500   | 
    | 101 | 2009-11-20 00:00:00 |           2 | 1560   | 
    | 103 | 2008-05-20 00:00:00 |           4 | 2060   | 
    +-----+---------------------+-------------+--------+

    有不同类型的联接给出如下:

    JOIN

    LEFT OUTER JOIN

    RIGHT OUTER JOIN

    FULL OUTER JOIN


    JOIN子句用于合并和检索来自多个表中的记录。JOIN和SQL OUTER JOIN类似。连接条件是使用主键和表的外键。

    下面的查询执行JOIN的CUSTOMER和ORDERS表。并检索记录。

    hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);

    成功执行查询后,能看到以下回应:

    +----+----------+-----+--------+ 
    | ID | NAME     | AGE | AMOUNT | 
    +----+----------+-----+--------+ 
    | 3  | kaushik  | 23  | 3000   | 
    | 3  | kaushik  | 23  | 1500   | 
    | 2  | Khilan   | 25  | 1560   | 
    | 4  | Chaitali | 25  | 2060   | 
    +----+----------+-----+--------+

    LEFT OUTER JOIN

    HiveQL LEFT OUTER JOIN返回所有行左表,即使是在正确的表中没有匹配。这意味着,如果ON子句匹配的右表零记录,JOIN还是返回结果行,但在右表中的每一行为NULL。

    LEFT JOIN返回左表中的所有的值,加上右表,或JOIN子句没有匹配的情况下返回NULL。

    下面的查询演示了CUSTOMER和ORDERS表之间的LEFT OUTER JOIN用法:

    hive > SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROME CUSTOMERS c LEFT JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);

    成功执行查询后,能看到以下回应:

    +----+----------+--------+---------------------+ 
    | ID | NAME     | AMOUNT | DATE                | 
    +----+----------+--------+---------------------+ 
    | 1  | Ramesh   | NULL   | NULL                | 
    | 2  | Khilan   | 1560   | 2009-11-20 00:00:00 | 
    | 3  | kaushik  | 3000   | 2009-10-08 00:00:00 | 
    | 3  | kaushik  | 1500   | 2009-10-08 00:00:00 | 
    | 4  | Chaitali | 2060   | 2008-05-20 00:00:00 | 
    | 5  | Hardik   | NULL   | NULL                | 
    | 6  | Komal    | NULL   | NULL                | 
    | 7  | Muffy    | NULL   | NULL                | 
    +----+----------+--------+---------------------+

    RIGHT OUTER JOIN

    HiveQL RIGHT OUTER JOIN返回右边表的所有行,即使在左表中没有匹配。如果ON子句的左表匹配零记录,JOIN结果返回一行,但在左表中的每一行为NULL。

    RIGHT JOIN返回右表中的所有值,加上左表,或者没有匹配的情况下返回NULL。

    下面的查询演示了CUSTOMERS和ODERS表之间使用RIGHT OUTER JOIN。

    hive > SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c RIGHT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);

    成功执行查询后,能看到以下回应:

    +------+----------+--------+---------------------+ 
    | ID   | NAME     | AMOUNT | DATE                | 
    +------+----------+--------+---------------------+ 
    | 3    | kaushik  | 3000   | 2009-10-08 00:00:00 | 
    | 3    | kaushik  | 1500   | 2009-10-08 00:00:00 | 
    | 2    | Khilan   | 1560   | 2009-11-20 00:00:00 | 
    | 4    | Chaitali | 2060   | 2008-05-20 00:00:00 | 
    +------+----------+--------+---------------------+


    FULL OUTER JOIN 

    HiveQL FULL OUTER JOIN结合了左边,并且满足JOIN条件合适外部表的记录。连接表包含两个表的所有记录,或两侧缺少匹配结果那么使用NULL值填补。

    下面的查询演示了CUSTOMERS和ORDERS表之间的FULL OUTER JOIN:

    hive > SELCE c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c FULL OUTER JOIN ODERS o ON (c.ID = o.CUSTOMER_ID);

    成功执行查询后,能看到以下回应:

    +------+----------+--------+---------------------+ 
    | ID   | NAME     | AMOUNT | DATE                | 
    +------+----------+--------+---------------------+ 
    | 1    | Ramesh   | NULL   | NULL                | 
    | 2    | Khilan   | 1560   | 2009-11-20 00:00:00 | 
    | 3    | kaushik  | 3000   | 2009-10-08 00:00:00 | 
    | 3    | kaushik  | 1500   | 2009-10-08 00:00:00 | 
    | 4    | Chaitali | 2060   | 2008-05-20 00:00:00 | 
    | 5    | Hardik   | NULL   | NULL                | 
    | 6    | Komal    | NULL   | NULL                |
    | 7    | Muffy    | NULL   | NULL                |  
    | 3    | kaushik  | 3000   | 2009-10-08 00:00:00 | 
    | 3    | kaushik  | 1500   | 2009-10-08 00:00:00 | 
    | 2    | Khilan   | 1560   | 2009-11-20 00:00:00 | 
    | 4    | Chaitali | 2060   | 2008-05-20 00:00:00 | 
    +------+----------+--------+---------------------+
  • 相关阅读:
    猫树
    单位根反演
    区间修改区间求和
    最远点 决策单调性
    圆方树
    912. 排序数组
    1309. 解码字母到整数映射
    28. 实现 strStr()
    31. 下一个排列
    22. 括号生成
  • 原文地址:https://www.cnblogs.com/Mayny/p/9374152.html
Copyright © 2011-2022 走看看