zoukankan      html  css  js  c++  java
  • HIVE简单操作

    1.hive命令登录HIVE数据库后,执行show databases;命令可以看到hive数据库中有一个默认的default数据库。

    [root@hadoop hive]# hive
    
    Logging initialized using configuration in file:/usr/local/hive/conf/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> show databases;
    OK
    default #可以看到HIVE默认自带了一个数据库default
    Time taken: 21.043 seconds, Fetched: 1 row(s)
    hive> 
    View Code

    然后登录mysql数据库,show databases;显示数据库名,可以看到有一个hive数据库;use hive; 进入hive数据库;show tables;显示表名;select * from DBS; #可以看到HIVE默认default数据库的元数据信息。

    [root@hadoop ~]# mysql -uroot -proot
    Warning: Using a password on the command line interface can be insecure.
    Welcome to the MySQL monitor.  Commands end with ; or g.
    Your MySQL connection id is 24
    Server version: 5.6.40-log MySQL Community Server (GPL)
    
    Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
    
    mysql> show databases;
    +--------------------+
    | Database           |
    +--------------------+
    | information_schema |
    | hive               |
    | mysql              |
    | performance_schema |
    | test               |
    +--------------------+
    5 rows in set (0.32 sec)
    
    mysql> use hive
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    
    Database changed
    mysql> show tables;
    +---------------------------+
    | Tables_in_hive            |
    +---------------------------+
    | AUX_TABLE                 |
    | BUCKETING_COLS            |
    | CDS                       |
    | COLUMNS_V2                |
    | COMPACTION_QUEUE          |
    | COMPLETED_COMPACTIONS     |
    | COMPLETED_TXN_COMPONENTS  |
    | DATABASE_PARAMS           |
    | DBS                       |
    | DB_PRIVS                  |
    | DELEGATION_TOKENS         |
    | FUNCS                     |
    | FUNC_RU                   |
    | GLOBAL_PRIVS              |
    | HIVE_LOCKS                |
    | IDXS                      |
    | INDEX_PARAMS              |
    | KEY_CONSTRAINTS           |
    | MASTER_KEYS               |
    | NEXT_COMPACTION_QUEUE_ID  |
    | NEXT_LOCK_ID              |
    | NEXT_TXN_ID               |
    | NOTIFICATION_LOG          |
    | NOTIFICATION_SEQUENCE     |
    | NUCLEUS_TABLES            |
    | PARTITIONS                |
    | PARTITION_EVENTS          |
    | PARTITION_KEYS            |
    | PARTITION_KEY_VALS        |
    | PARTITION_PARAMS          |
    | PART_COL_PRIVS            |
    | PART_COL_STATS            |
    | PART_PRIVS                |
    | ROLES                     |
    | ROLE_MAP                  |
    | SDS                       |
    | SD_PARAMS                 |
    | SEQUENCE_TABLE            |
    | SERDES                    |
    | SERDE_PARAMS              |
    | SKEWED_COL_NAMES          |
    | SKEWED_COL_VALUE_LOC_MAP  |
    | SKEWED_STRING_LIST        |
    | SKEWED_STRING_LIST_VALUES |
    | SKEWED_VALUES             |
    | SORT_COLS                 |
    | TABLE_PARAMS              |
    | TAB_COL_STATS             |
    | TBLS                      |
    | TBL_COL_PRIVS             |
    | TBL_PRIVS                 |
    | TXNS                      |
    | TXN_COMPONENTS            |
    | TYPES                     |
    | TYPE_FIELDS               |
    | VERSION                   |
    | WRITE_SET                 |
    +---------------------------+
    57 rows in set (0.00 sec)
    
    mysql> select * from DBS; #可以看到HIVE默认数据库default的元数据
    +-------+-----------------------+----------------------------------------+---------+------------+------------+
    | DB_ID | DESC                  | DB_LOCATION_URI                        | NAME    | OWNER_NAME | OWNER_TYPE |
    +-------+-----------------------+----------------------------------------+---------+------------+------------+
    |     1 | Default Hive database | hdfs://hadoop:9000/user/hive/warehouse | default | public     | ROLE       |
    +-------+-----------------------+----------------------------------------+---------+------------+------------+
    1 row in set (0.00 sec)
    
    mysql> 
    View Code

    2.在hive创建一个测试库

    hive> create database testhive; #创建库
    OK
    Time taken: 3.45 seconds
    
    hive> show databases; #显示库
    OK
    default
    testhive
    Time taken: 1.123 seconds, Fetched: 2 row(s)

    在mysql查看,发现显示了测试库元数据信息(包括testhive的DB_ID,在HDFS上的存储位置等 )

    mysql> select * from DBS;
    +-------+-----------------------+----------------------------------------------------+----------+------------+------------+
    | DB_ID | DESC                  | DB_LOCATION_URI                                    | NAME     | OWNER_NAME | OWNER_TYPE |
    +-------+-----------------------+----------------------------------------------------+----------+------------+------------+
    |     1 | Default Hive database | hdfs://hadoop:9000/user/hive/warehouse             | default  | public     | ROLE       |
    |     6 | NULL                  | hdfs://hadoop:9000/user/hive/warehouse/testhive.db | testhive | root       | USER       |
    +-------+-----------------------+----------------------------------------------------+----------+------------+------------+
    2 rows in set (0.00 sec)

    在HDFS查看,我们看一下testhive.db是什么。它其实就是一个目录,所以说创建一个数据库其实就是创建了一个目录

    我创建的hdfs目录明明是/usr/hive/warehouse/,不知道为啥数据库却保存到了/user/hive/warehouse/??哪里出错了??或者说是我的目录创建错了,应该创建的就是/user/hive/warehouse/?

    [root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse
    Found 1 items
    drwxr-xr-x   - root supergroup          0 2018-07-27 15:17 /user/hive/warehouse/testhive.db

    3.创建表

    hive> use testhive; #使用库
    OK
    Time taken: 0.131 seconds
    
    hive> create table test(id int); 创建表
    OK
    Time taken: 3.509 seconds

    在mysql中查看表的信息,可以看到test表归属于DB_ID为6的数据库,即testhive(可 select * from DBS; 查看)

    mysql> select * from TBLS;
    +--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
    | TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
    +--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
    |      1 |  1532677542 |     6 |                0 | root  |         0 |     1 | test     | MANAGED_TABLE | NULL               | NULL               |                    |
    +--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
    1 row in set (0.01 sec)

    在HDFS中查看,发现HDFS为新表创建了一个目录

    [root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse/testhive.db
    Found 1 items
    drwxr-xr-x   - root supergroup          0 2018-07-27 16:03 /user/hive/warehouse/testhive.db/test

    4.插入数据。

    4.1 在表中插入数据 insert into test values (1);  可以看到系统在对数据进行MapReduce。

    hive> insert into test values (1);
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = root_20180727155527_5971c7d8-9b5c-4ef3-98f7-63febe38c79a
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1532671010251_0001, Tracking URL = http://hadoop:8088/proxy/application_1532671010251_0001/
    Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1532671010251_0001
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2018-07-27 16:02:25,979 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.32 sec
    MapReduce Total cumulative CPU time: 3 seconds 320 msec
    Ended Job = job_1532671010251_0001
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to directory hdfs://hadoop:9000/user/hive/warehouse/testhive.db/test/.hive-staging_hive_2018-07-27_15-55-27_353_3121708441542170724-1/-ext-10000
    Loading data to table testhive.test
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1   Cumulative CPU: 3.32 sec   HDFS Read: 3951 HDFS Write: 71 SUCCESS
    Total MapReduce CPU Time Spent: 3 seconds 320 msec
    OK
    Time taken: 453.982 seconds
    View Code

    在HDFS查看,发现HDFS将插入的数据封装成了一个文件000000_0

    [root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse/testhive.db/test
    -rwxr-xr-x   1 root supergroup          2 2018-07-27 16:01 /user/hive/warehouse/testhive.db/test/000000_0
    [root@hadoop ~]# hdfs dfs -cat /user/hive/warehouse/testhive.db/test/000000_0
    1

    4.2 再插入一个数据 insert into test values (2); 可以看到系统还是在对数据进行MapReduce。

    hive>  insert into test values (2); 

    在HDFS中查看,发现HDFS将插入的数据封装成了另外一个文件000000_0_copy_1

    [root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse/testhive.db/test
    Found 2 items
    -rwxr-xr-x   1 root supergroup          2 2018-07-27 16:01 /user/hive/warehouse/testhive.db/test/000000_0
    -rwxr-xr-x   1 root supergroup          2 2018-07-27 16:22 /user/hive/warehouse/testhive.db/test/000000_0_copy_1
    [root@hadoop ~]# hdfs dfs -cat /user/hive/warehouse/testhive.db/test/000000_0_copy_1
    2

    4.3 再插入一个数据 insert into test values (3); 可以看到系统还是在对数据进行MapReduce。

    在HDFS中查看,发现HDFS将插入的数据封装成了另外一个文件000000_0_copy_2

    [root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse/testhive.db/test
    Found 3 items
    -rwxr-xr-x   1 root supergroup          2 2018-07-27 16:01 /user/hive/warehouse/testhive.db/test/000000_0
    -rwxr-xr-x   1 root supergroup          2 2018-07-27 16:22 /user/hive/warehouse/testhive.db/test/000000_0_copy_1
    -rwxr-xr-x   1 root supergroup          2 2018-07-27 16:37 /user/hive/warehouse/testhive.db/test/000000_0_copy_2
    [root@hadoop ~]# hdfs dfs -cat /user/hive/warehouse/testhive.db/test/000000_0_copy_2
    3

    4.4 在hive中查看表

    hive> select * from test;
    OK
    1
    2
    3
    Time taken: 5.483 seconds, Fetched: 3 row(s)

    5.从本地文件加载数据

    先创建文件

    [root@hadoop ~]# vi hive.txt  #创建文件
    4
    5
    6
    7
    8
    9
    0
    #保存退出

    然后加载数据

    hive> load data local inpath '/root/hive.txt' into table testhive.test; #加载数据
    Loading data to table testhive.test
    OK
    Time taken: 6.282 seconds

    在hive中查看,发现文件内容被映射到了表中的对应的列里

    hive> select * from test;
    OK
    1
    2
    3
    4
    5
    6
    7
    8
    9
    0
    Time taken: 0.534 seconds, Fetched: 10 row(s)

    在HDFS查看,发现hive.txt文件被保存到了test表目录下

    [root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse/testhive.db/test
    Found 4 items
    -rwxr-xr-x   1 root supergroup          2 2018-07-27 16:01 /user/hive/warehouse/testhive.db/test/000000_0
    -rwxr-xr-x   1 root supergroup          2 2018-07-27 16:22 /user/hive/warehouse/testhive.db/test/000000_0_copy_1
    -rwxr-xr-x   1 root supergroup          2 2018-07-27 16:37 /user/hive/warehouse/testhive.db/test/000000_0_copy_2
    -rwxr-xr-x   1 root supergroup         14 2018-07-27 16:48 /user/hive/warehouse/testhive.db/test/hive.txt

    6.hive也支持排序 select * from test order by id desc; 可以看到hive此时也是有一个MapReduce过程

    hive> select * from test order by id desc; 
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = root_20180730093619_c798eb69-b94f-4678-94cc-5ec56865ed5c
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1532913019648_0001, Tracking URL = http://hadoop:8088/proxy/application_1532913019648_0001/
    Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1532913019648_0001
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-07-30 09:38:13,904 Stage-1 map = 0%,  reduce = 0%
    2018-07-30 09:39:09,656 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 1.66 sec
    2018-07-30 09:39:14,311 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.72 sec
    2018-07-30 09:39:49,708 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.41 sec
    MapReduce Total cumulative CPU time: 5 seconds 930 msec
    Ended Job = job_1532913019648_0001
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.93 sec   HDFS Read: 6799 HDFS Write: 227 SUCCESS
    Total MapReduce CPU Time Spent: 5 seconds 930 msec
    OK
    9
    8
    7
    6
    5
    4
    3
    2
    1
    0
    Time taken: 224.27 seconds, Fetched: 10 row(s)
    View Code

     

    7.hive也支持desc test;

    hive> desc test;
    OK
    id                      int                                         
    Time taken: 6.194 seconds, Fetched: 1 row(s)

    hive数据库的操作和mysql其实差不多,它的缺点是没有修改和删除命令,优点是不需要用户亲自写MapReduce,只需要通过简单的sql语句的形式就可以实现复杂关系。

    hive的操作还有很多,以后用到再整理吧。

  • 相关阅读:
    判断的几种结构
    关于电脑的基础单词笔记
    JAVA插入数据笔记
    完全卸载oracle11g步骤
    hibernate框架
    Java中的字符串比较
    java集合 list与Set、Map区别
    向MyEclipse的项目中导入js文件时,出现小红叉
    Java基础面试题
    java面试题 -- JVM
  • 原文地址:https://www.cnblogs.com/zhengna/p/9378282.html
Copyright © 2011-2022 走看看