zoukankan      html  css  js  c++  java
  • Hive基础练习一

    下面是hive基本练习,持续补充中。

    简述Hive工作原理

    hive是基于hadoop,可以管理hdfs上的数据的工具,它本质上是执行MapReduce程序,只是使用了类sql语句更加方便开发,hive驱动器会将类sql语句转换成MapReduce的task来执行,因此执行速度会比较慢。

    hive的核心是驱动器,它可以连接sql和hdfs,将sql转换成了MapReduce任务,驱动器主要包括:

    (1)解析器:解析sql语句,划分为不同的stage

    (2)编译器:将不同阶段的stage编译,变成一个个的MR任务

    (3)优化器:对逻辑执行进行优化

    (4)执行器:将sql里的逻辑任务转换为hdfs的物理任务,hive的执行器就是MapReduce

    hive 内部表和外部表区别

    内部表:主要用在数据仓库层(DW层),为个人独自占有,如果删除表格,对应的原始数据也将删除,但是对其他人没有影响。

    外部表:主要用在源数据层(ODS层),删除表格不会删除对应的数据。

    在建表时,如果是内部表,不需要使用external关键字,外部表需要使用external关键字。

    创建表格导入数据练习1

    战狼2,吴京:吴刚:卢婧姗,2017-08-16
    大话西游,周星驰:吴孟达,1995-09-01
    哪吒,吕艳婷:瀚墨,2019-07-26
    使徒行者2,张家辉:古天乐:吴镇宇,2019-08-07
    鼠胆英雄,岳云鹏:佟丽娅:田雨:袁弘,2019-08-02

    创建表格,导入数据。

    # 建表
    0: jdbc:hive2://node01:10000> create table movie_info(moviename string,actors array<string>,showtime string) row format delimited by ',' collection items teminated by ':';
    Error: Error while compiling statement: FAILED: ParseException line 1:100 cannot recognize input near 'by' '','' 'collection' in serde properties specification (state=42000,code=40000)
    0: jdbc:hive2://node01:10000> create table movie_info(moviename string,actors array<string>,showtime string) row format delimited fields terminated by ',' collection items teminated by ':';
    Error: Error while compiling statement: FAILED: ParseException line 1:142 mismatched input 'teminated' expecting TERMINATED near 'items' in table row format's column separator (state=42000,code=40000)
    0: jdbc:hive2://node01:10000> create table movie_info(moviename string,actors array<string>,showtime string) row format delimited fields terminated by ',' collection items terminated by ':';
    INFO  : Compiling command(queryId=hadoop_20191115220202_01acb251-d8e2-46c5-bf20-5b354d6d2923): create table movie_info(moviename string,actors array<string>,showtime string) row format delimited fields terminated by ',' collection items terminated by ':'
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115220202_01acb251-d8e2-46c5-bf20-5b354d6d2923); Time taken: 0.035 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115220202_01acb251-d8e2-46c5-bf20-5b354d6d2923): create table movie_info(moviename string,actors array<string>,showtime string) row format delimited fields terminated by ',' collection items terminated by ':'
    INFO  : Starting task [Stage-0:DDL] in serial mode
    INFO  : Completed executing command(queryId=hadoop_20191115220202_01acb251-d8e2-46c5-bf20-5b354d6d2923); Time taken: 0.286 seconds
    INFO  : OK
    No rows affected (0.359 seconds)
    # 加载数据
    0: jdbc:hive2://node01:10000> load data local inpath '/kkb/install/hivedatas/move_info.txt' into table movie_info;
    INFO  : Compiling command(queryId=hadoop_20191115220505_1c2ea52f-37cf-43d2-85b0-e2b54049aa33): load data local inpath '/kkb/install/hivedatas/move_info.txt' into table movie_info
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115220505_1c2ea52f-37cf-43d2-85b0-e2b54049aa33); Time taken: 0.061 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115220505_1c2ea52f-37cf-43d2-85b0-e2b54049aa33): load data local inpath '/kkb/install/hivedatas/move_info.txt' into table movie_info
    INFO  : Starting task [Stage-0:MOVE] in serial mode
    INFO  : Loading data to table db_hive.movie_info from file:/kkb/install/hivedatas/move_info.txt
    INFO  : Starting task [Stage-1:STATS] in serial mode
    INFO  : Table db_hive.movie_info stats: [numFiles=1, totalSize=235]
    INFO  : Completed executing command(queryId=hadoop_20191115220505_1c2ea52f-37cf-43d2-85b0-e2b54049aa33); Time taken: 0.645 seconds
    INFO  : OK
    No rows affected (0.726 seconds)
    # 查询加载后结果
    0: jdbc:hive2://node01:10000> select * from movie_info;
    INFO  : Compiling command(queryId=hadoop_20191115220505_0e4cbd35-3f1c-44d8-8a03-2b7dcd7ab03e): select * from movie_info
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:movie_info.moviename, type:string, comment:null), FieldSchema(name:movie_info.actors, type:array<string>, comment:null), FieldSchema(name:movie_info.showtime, type:string, comment:null)], properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115220505_0e4cbd35-3f1c-44d8-8a03-2b7dcd7ab03e); Time taken: 0.09 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115220505_0e4cbd35-3f1c-44d8-8a03-2b7dcd7ab03e): select * from movie_info
    INFO  : Completed executing command(queryId=hadoop_20191115220505_0e4cbd35-3f1c-44d8-8a03-2b7dcd7ab03e); Time taken: 0.0 seconds
    INFO  : OK
    +-----------------------+--------------------------+----------------------+--+
    | movie_info.moviename  |    movie_info.actors     | movie_info.showtime  |
    +-----------------------+--------------------------+----------------------+--+
    | 战狼2                   | ["吴京","吴刚","卢婧姗"]        | 2017-08-16           |
    | 大话西游                  | ["周星驰","吴孟达"]            | 1995-09-01           |
    | 哪吒                    | ["吕艳婷","瀚墨"]             | 2019-07-26           |
    | 使徒行者2                 | ["张家辉","古天乐","吴镇宇"]      | 2019-08-07           |
    | 鼠胆英雄                  | ["岳云鹏","佟丽娅","田雨","袁弘"]  | 2019-08-02           |
    +-----------------------+--------------------------+----------------------+--+
    5 rows selected (0.177 seconds)
    

    3.1 查询出每个电影的第二个主演

    0: jdbc:hive2://node01:10000> select moviename,actors[1] from movie_info;
    INFO  : Compiling command(queryId=hadoop_20191115220909_702e25ac-bbf9-4fb3-a39e-44156379fcf3): select moviename,actors[1] from movie_info
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:moviename, type:string, comment:null), FieldSchema(name:_c1, type:string, comment:null)], properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115220909_702e25ac-bbf9-4fb3-a39e-44156379fcf3); Time taken: 0.134 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115220909_702e25ac-bbf9-4fb3-a39e-44156379fcf3): select moviename,actors[1] from movie_info
    INFO  : Completed executing command(queryId=hadoop_20191115220909_702e25ac-bbf9-4fb3-a39e-44156379fcf3); Time taken: 0.0 seconds
    INFO  : OK
    +------------+------+--+
    | moviename  | _c1  |
    +------------+------+--+
    | 战狼2        | 吴刚   |
    | 大话西游       | 吴孟达  |
    | 哪吒         | 瀚墨   |
    | 使徒行者2      | 古天乐  |
    | 鼠胆英雄       | 佟丽娅  |
    +------------+------+--+
    

    3.2 查询每部电影有几名主演

    0: jdbc:hive2://node01:10000> select moviename,size(actors) as actorcount from movie_info;
    INFO  : Compiling command(queryId=hadoop_20191115221010_1b46f1ce-12bf-406d-9cfc-395df7f26816): select moviename,size(actors) as actorcount from movie_info
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:moviename, type:string, comment:null), FieldSchema(name:actorcount, type:int, comment:null)], properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115221010_1b46f1ce-12bf-406d-9cfc-395df7f26816); Time taken: 0.075 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115221010_1b46f1ce-12bf-406d-9cfc-395df7f26816): select moviename,size(actors) as actorcount from movie_info
    INFO  : Completed executing command(queryId=hadoop_20191115221010_1b46f1ce-12bf-406d-9cfc-395df7f26816); Time taken: 0.001 seconds
    INFO  : OK
    +------------+-------------+--+
    | moviename  | actorcount  |
    +------------+-------------+--+
    | 战狼2        | 3           |
    | 大话西游       | 2           |
    | 哪吒         | 2           |
    | 使徒行者2      | 3           |
    | 鼠胆英雄       | 4           |
    +------------+-------------+--+
    

    3.3主演里面包含古天乐的电影

    # 需要使用lateral view
    0: jdbc:hive2://node01:10000> Select t.moviename,t.actor from
    . . . . . . . . . . . . . . > (
    . . . . . . . . . . . . . . > select moviename,actor from movie_info lateral view explode(actors)temp as actor
    . . . . . . . . . . . . . . > ) t
    . . . . . . . . . . . . . . > Where t.actor='古天乐';
    INFO  : Compiling command(queryId=hadoop_20191115222525_532eeeb8-06db-4828-9e6d-73594e31877e): Select t.moviename,t.actor from
    (
    select moviename,actor from movie_info lateral view explode(actors)temp as actor
    ) t
    Where t.actor='古天乐'
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:t.moviename, type:string, comment:null), FieldSchema(name:t.actor, type:string, comment:null)], properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115222525_532eeeb8-06db-4828-9e6d-73594e31877e); Time taken: 0.225 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115222525_532eeeb8-06db-4828-9e6d-73594e31877e): Select t.moviename,t.actor from
    (
    select moviename,actor from movie_info lateral view explode(actors)temp as actor
    ) t
    Where t.actor='古天乐'
    INFO  : Completed executing command(queryId=hadoop_20191115222525_532eeeb8-06db-4828-9e6d-73594e31877e); Time taken: 0.0 seconds
    INFO  : OK
    +--------------+----------+--+
    | t.moviename  | t.actor  |
    +--------------+----------+--+
    | 使徒行者2        | 古天乐      |
    +--------------+----------+--+
    

    创建表格导入数据练习2

    1,张三,18:male:北京
    2,李四,29:female:上海
    3,杨朝来,22:male:深圳 
    4,蒋平,34:male:成都
    5,唐灿华,25:female:哈尔滨
    6,马达,17:male:北京
    7,赵小雪,23:female:杭州
    8,薛文泉,26:male:上海
    9,丁建,29:male:北京

    创建表格,加载数据。

    # 创建表格
    0: jdbc:hive2://node01:10000> create table dept(id int,name string,info struct<age:int,gender:string,city:string>) row format delimited fields terminated by ',' collection items terminated by ':';
    INFO  : Compiling command(queryId=hadoop_20191115223434_1d5c9e49-c4e7-4930-938d-4a5823486099): create table dept(id int,name string,info struct<age:int,gender:string,city:string>) row format delimited fields terminated by ',' collection items terminated by ':'
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115223434_1d5c9e49-c4e7-4930-938d-4a5823486099); Time taken: 0.008 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115223434_1d5c9e49-c4e7-4930-938d-4a5823486099): create table dept(id int,name string,info struct<age:int,gender:string,city:string>) row format delimited fields terminated by ',' collection items terminated by ':'
    INFO  : Starting task [Stage-0:DDL] in serial mode
    INFO  : Completed executing command(queryId=hadoop_20191115223434_1d5c9e49-c4e7-4930-938d-4a5823486099); Time taken: 0.086 seconds
    INFO  : OK
    No rows affected (0.131 seconds)
    0: jdbc:hive2://node01:10000> desc dept;
    INFO  : Compiling command(queryId=hadoop_20191115223434_270cf471-d325-40ed-af27-4afff5a6aabf): desc dept
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115223434_270cf471-d325-40ed-af27-4afff5a6aabf); Time taken: 0.1 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115223434_270cf471-d325-40ed-af27-4afff5a6aabf): desc dept
    INFO  : Starting task [Stage-0:DDL] in serial mode
    INFO  : Completed executing command(queryId=hadoop_20191115223434_270cf471-d325-40ed-af27-4afff5a6aabf); Time taken: 0.042 seconds
    INFO  : OK
    +-----------+--------------------------------------------+----------+--+
    | col_name  |                 data_type                  | comment  |
    +-----------+--------------------------------------------+----------+--+
    | id        | int                                        |          |
    | name      | string                                     |          |
    | info      | struct<age:int,gender:string,city:string>  |          |
    +-----------+--------------------------------------------+----------+--+
    3 rows selected (0.169 seconds)
    # 加载数据
    0: jdbc:hive2://node01:10000> load data local inpath '/kkb/install/hivedatas/dept.txt' overwrite into table dept;
    INFO  : Compiling command(queryId=hadoop_20191115223535_d81bba57-47cc-473b-be36-fe32053922b8): load data local inpath '/kkb/install/hivedatas/dept.txt' overwrite into table dept
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115223535_d81bba57-47cc-473b-be36-fe32053922b8); Time taken: 0.026 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115223535_d81bba57-47cc-473b-be36-fe32053922b8): load data local inpath '/kkb/install/hivedatas/dept.txt' overwrite into table dept
    INFO  : Starting task [Stage-0:MOVE] in serial mode
    INFO  : Loading data to table db_hive.dept from file:/kkb/install/hivedatas/dept.txt
    INFO  : Starting task [Stage-1:STATS] in serial mode
    INFO  : Table db_hive.dept stats: [numFiles=1, totalSize=240]
    INFO  : Completed executing command(queryId=hadoop_20191115223535_d81bba57-47cc-473b-be36-fe32053922b8); Time taken: 0.303 seconds
    INFO  : OK
    # 查询数据
    0: jdbc:hive2://node01:10000> select * from dept;
    INFO  : Compiling command(queryId=hadoop_20191115223737_b10807f4-8f05-4b3b-aa33-4e0bc8c408f6): select * from dept
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:dept.id, type:int, comment:null), FieldSchema(name:dept.name, type:string, comment:null), FieldSchema(name:dept.info, type:struct<age:int,gender:string,city:string>, comment:null)], properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115223737_b10807f4-8f05-4b3b-aa33-4e0bc8c408f6); Time taken: 0.054 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115223737_b10807f4-8f05-4b3b-aa33-4e0bc8c408f6): select * from dept
    INFO  : Completed executing command(queryId=hadoop_20191115223737_b10807f4-8f05-4b3b-aa33-4e0bc8c408f6); Time taken: 0.0 seconds
    INFO  : OK
    +----------+------------+--------------------------------------------+--+
    | dept.id  | dept.name  |                 dept.info                  |
    +----------+------------+--------------------------------------------+--+
    | 1        | 张三         | {"age":18,"gender":"male","city":"北京"}     |
    | 2        | 李四         | {"age":29,"gender":"female","city":"上海"}   |
    | 3        | 杨朝来        | {"age":22,"gender":"male","city":"深圳 "}    |
    | 4        | 蒋平         | {"age":34,"gender":"male","city":"成都"}     |
    | 5        | 唐灿华        | {"age":25,"gender":"female","city":"哈尔滨"}  |
    | 6        | 马达         | {"age":17,"gender":"male","city":"北京"}     |
    | 7        | 赵小雪        | {"age":23,"gender":"female","city":"杭州"}   |
    | 8        | 薛文泉        | {"age":26,"gender":"male","city":"上海"}     |
    | 9        | 丁建         | {"age":29,"gender":"male","city":"北京"}     |
    +----------+------------+--------------------------------------------+--+
    

    4.1 查询出每个人的id,名字,居住地址

    0: jdbc:hive2://node01:10000> select id,name,info.city from dept;
    INFO  : Compiling command(queryId=hadoop_20191115224040_796610ca-d24a-48ac-a8aa-d1eaee1ddce9): select id,name,info.city from dept
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:city, type:string, comment:null)], properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20191115224040_796610ca-d24a-48ac-a8aa-d1eaee1ddce9); Time taken: 0.081 seconds
    INFO  : Concurrency mode is disabled, not creating a lock manager
    INFO  : Executing command(queryId=hadoop_20191115224040_796610ca-d24a-48ac-a8aa-d1eaee1ddce9): select id,name,info.city from dept
    INFO  : Completed executing command(queryId=hadoop_20191115224040_796610ca-d24a-48ac-a8aa-d1eaee1ddce9); Time taken: 0.001 seconds
    INFO  : OK
    +-----+-------+-------+--+
    | id  | name  | city  |
    +-----+-------+-------+--+
    | 1   | 张三    | 北京    |
    | 2   | 李四    | 上海    |
    | 3   | 杨朝来   | 深圳    |
    | 4   | 蒋平    | 成都    |
    | 5   | 唐灿华   | 哈尔滨   |
    | 6   | 马达    | 北京    |
    | 7   | 赵小雪   | 杭州    |
    | 8   | 薛文泉   | 上海    |
    | 9   | 丁建    | 北京    |
    +-----+-------+-------+--+
    

    以上为hive练习部分,记录一下。

  • 相关阅读:
    操作系统(32-45)
    异或运算( ^ )
    计算机网络(16—30)
    操作系统(13-30)
    win7 linux双系统删除linux
    ubuntu安装matplotlib包
    vmware+CentOS 7 无法上网
    Python命令行清屏的简单办法
    jupyter notebook 工作目录修改
    ipython notebook设置工作路径和自动保存.py文件 ipython_notebook_config.py
  • 原文地址:https://www.cnblogs.com/youngchaolin/p/11877795.html
Copyright © 2011-2022 走看看