zoukankan      html  css  js  c++  java
  • Hive 教程(三)-DDL基础

    DDL,Hive Data Definition Language,数据定义语言;

    通俗理解就是数据库与库表相关的操作,本文总结一下基本方法

    hive 数据仓库配置

    hive 数据仓库默认位置在 hdfs 上的 /user/hive/warehouse 路径下;

    hive 有个默认的数据库叫 default;

    但是在 /user/hive/warehouse 路径下没有创建 default 文件夹,default 下的表是直接在 /user/hive/warehouse 路径下 创建文件夹

    在 hive 中,数据库对应 hdfs 上一个路径(叫文件夹或者更合适),数据表也对应 hdfs 上一个路径,数据对应 hdfs 上一个文件

    管理表 vs 外建表

    管理表也称内建表;hive 默认创建的表都是管理表;

    管理表和外建表的数据都存储在 hdfs,因为都是 hive 的表;

    区别

    hive 在创建内部表时,会把数据移动到数据仓库指定的路径,如 hdfs 某个地方;

    如果创建外部表,不会移动数据,仅在元数据中记录数据所在的位置;

    最大的区别在于:当删除内部表时,同时删除数据和元数据;当删除外部表时,仅删除元数据,不删除数据;

    鉴于这种特性,管理表不适合共享数据,容易产生安全问题;

    在实际工作中,一般使用外建表

    相互转换

    查看表的类型

    hive> desc formatted student_p;
    Table Type:             MANAGED_TABLE 

    管理表 to 外部表

    hive> alter table student_p set tblproperties('EXTERNAL'='TRUE');
    Table Type: EXTERNAL_TABLE

    外部表 to 管理表

    hive> alter table student_p set tblproperties('EXTERNAL'='FALSE');

    注意必须大写

    Database

    Create Database

    CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
      [COMMENT database_comment]
      [LOCATION hdfs_path]
      [WITH DBPROPERTIES (property_name=property_value, ...)];

    示例

    hive> create database hive1101 location '/usr/hive_test';
    OK
    Time taken: 0.12 seconds

    注意这里 location 的地址并不是 hive 默认的 hdfs 地址,说明是可以指定非默认地址的

    Drop Database

    数据库必须是空的

    DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

    Alter Database

    改变数据库的属性

    ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);   -- (Note: SCHEMA added in Hive 0.14.0)
    ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;   -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
    ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)

    示例

    hive> alter database hive1101 set dbproperties ('edit_by'='wjd');
    OK
    Time taken: 0.118 seconds

    注意,location 无法更改

    可能只有 Hive 2.2.1, 2.4.0 and later 才可以,我的是 2.3.6,没有测试

    Use Database

    切换到目标数据库下

    USE database_name;
    USE DEFAULT;

    Show Database

    显示所有数据库名称

    show databases;

    显示数据库信息

    hive> desc database hive1101;
    OK
    hive1101        hdfs://hadoop10:9000/usr/hive_test    root    USER    
    Time taken: 0.182 seconds, Fetched: 1 row(s)

    只显示了元数据信息,也可以 在 database 后加 extended,能多显示一些信息

    Table

    Create Table

    CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
      [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
      [COMMENT table_comment]
      [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
      [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
      [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
         ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
         [STORED AS DIRECTORIES]
      [
       [ROW FORMAT row_format] 
       [STORED AS file_format]
         | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
      ]
      [LOCATION hdfs_path]

    后面还有很多参数,具体可参照官网 - 下面的参考资料

    参数解释

    temporary

    exeternal:创建一个外部表,同时需要指定实际数据所在的路径,location 来指定

    like:复制表结构,但不复制数据

    row format:指定每行的格式,如果原数据的格式不符,可以写入表,但不能正确的写入表

      // delimited fields terminated by ' '   以 为间隔

      // delimited fields terminated by ','  注意逗号分隔的只能是 csv 文件,自己写的不能用,会出错

      // delimited 间隔;terminated 结尾;

    ROW FORMAT 
    DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] 
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] 
       | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

    stored as: 加载的文件格式

      // 如果是纯文本文件,可以用 stored as textfile;如果是压缩文件,可以用 stored as  SEQUENCEFILE

      // 还有 ORC、json 等多种个数,可查看官网

    partitioned by:分区表,这个很重要,后面专门讲

    CLUSTERED BY:分桶表,后面和分区表一起将

    示例

    hive> create table student(id int,name string) row format delimited fields terminated by '	'; 创建表,以
    hive> create  table if not exists student1 like student; 创建一个和表一样模式的表
    
    hive> create table if not exists mytable(sid int,sname string)
        >  row format delimited fields terminated by '05' 
        >  stored as textfile; 创建内部表
        
    hive> create external table if not exists pageview(
        >  pageid int,
        >  page_url string comment 'the page url'
        > )
        > row format delimited fields terminated by ','
        > location 'hdfs://192.168.220.144:9000/user/hive/warehouse'; 创建外部表
        
    hive> create table student_p(id int,name string,sexex string,age int,dept string)
        >  partitioned by(part string)
        >  row format delimited fields terminated by ','
        >  stored as textfile;    创建分区表

    测试 row format 

    写入如下数据到 student,以 为间隔

    1    a
    2    b
    3    c
    4 d,

    很显然,最后一行不是以 间隔

    hive> load data local inpath '/usr/lib/hive2.3.6/1.txt' into table student;
    Loading data to table hive1101.student
    OK
    Time taken: 0.868 seconds
    hive> select * from student;
    OK
    1    a
    2    b
    3    c
    NULL    NULL
    Time taken: 0.17 seconds, Fetched: 4 row(s)

    可以看到最后一行没有正确的写入

    Drop Table

    DROP TABLE [IF EXISTS] table_name [PURGE];     -- (Note: PURGE available in Hive 0.14.0 and later)

    Truncate Table

    清空表;注意不能清空外部表

    TRUNCATE TABLE table_name [PARTITION partition_spec];
     
    partition_spec:
      : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

    Alter Table

    修改表的属性

    Rename Table

    ALTER TABLE table_name RENAME TO new_table_name;

    Alter Table Properties

    ALTER TABLE table_name SET TBLPROPERTIES table_properties;
     
    table_properties:
      : (property_name = property_value, property_name = property_value, ... )
    Alter Table Comment
    ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);

    Add SerDe Properties

    ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
     
    ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;
     
    serde_properties:
      : (property_name = property_value, property_name = property_value, ... )

    Alter Column

    Change Column Name/Type/Position/Comment

    修改列的名字,类型,位置 等

    ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
      [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];

    示例

    CREATE TABLE test_change (a int, b int, c int);
     
    // First change column a's name to a1.
    ALTER TABLE test_change CHANGE a a1 INT;
     
    // Next change column a1's name to a2, its data type to string, and put it after column b.
    ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b;
    // The new table's structure is:  b int, a2 string, c int.
      
    // Then change column c's name to c1, and put it as the first column.
    ALTER TABLE test_change CHANGE c c1 INT FIRST;
    // The new table's structure is:  c1 int, b int, a2 string.
      
    // Add a comment to column a1
    ALTER TABLE test_change CHANGE a1 a1 INT COMMENT 'this is column a1';

    Add/Replace Columns

    增加或者替换列

    ALTER TABLE table_name 
      [PARTITION partition_spec]                 -- (Note: Hive 0.14.0 and later)
      ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
      [CASCADE|RESTRICT]                         -- (Note: Hive 1.1.0 and later)

    Index

    Create Index

    CREATE INDEX index_name
      ON TABLE base_table_name (col_name, ...)
      AS index_type
      [WITH DEFERRED REBUILD]
      [IDXPROPERTIES (property_name=property_value, ...)]
      [IN TABLE index_table_name]
      [
         [ ROW FORMAT ...] STORED AS ...
         | STORED BY ...
      ]
      [LOCATION hdfs_path]
      [TBLPROPERTIES (...)]
      [COMMENT "index comment"];

    Drop Index

    DROP INDEX [IF EXISTS] index_name ON table_name;

    Alter Index

    ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD;

    Show

    Show Databases

    SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];

    Show Tables

    SHOW TABLES [IN database_name] ['identifier_with_wildcards'];

    Show Table Properties

    SHOW TBLPROPERTIES tblname;
    SHOW TBLPROPERTIES tblname("foo");

    Show Create Table

    SHOW CREATE TABLE ([db_name.]table_name|view_name);

    Show Indexes

    SHOW [FORMATTED] (INDEX|INDEXES) ON table_with_index [(FROM|IN) db_name];

    Show Columns

    SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name];

    示例

    -- SHOW COLUMNS
    CREATE DATABASE test_db;
    USE test_db;
    CREATE TABLE foo(col1 INT, col2 INT, col3 INT, cola INT, colb INT, colc INT, a INT, b INT, c INT);
      
    -- SHOW COLUMNS basic syntax
    SHOW COLUMNS FROM foo;                            -- show all column in foo
    SHOW COLUMNS FROM foo "*";                        -- show all column in foo
    SHOW COLUMNS IN foo "col*";                       -- show columns in foo starting with "col"                 OUTPUT col1,col2,col3,cola,colb,colc
    SHOW COLUMNS FROM foo '*c';                       -- show columns in foo ending with "c"                     OUTPUT c,colc
    SHOW COLUMNS FROM foo LIKE "col1|cola";           -- show columns in foo either col1 or cola                 OUTPUT col1,cola
    SHOW COLUMNS FROM foo FROM test_db LIKE 'col*';   -- show columns in foo starting with "col"                 OUTPUT col1,col2,col3,cola,colb,colc
    SHOW COLUMNS IN foo IN test_db LIKE 'col*';       -- show columns in foo starting with "col" (FROM/IN same)  OUTPUT col1,col2,col3,cola,colb,colc
      
    -- Non existing column pattern resulting in no match
    SHOW COLUMNS IN foo "nomatch*";
    SHOW COLUMNS IN foo "col+";                       -- + wildcard not supported
    SHOW COLUMNS IN foo "nomatch";

    加载数据

    不属于 DDL,属于 DML,后面会讲

    参考资料:

    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL  官网

    https://ask.hellobi.com/blog/wujiadong/9483

    https://blog.csdn.net/xiaozelulu/article/details/81585867

  • 相关阅读:
    tmux 鼠标滚动
    宝藏主题 cnblogsthemesilence
    数组乱序初始化:sorry, unimplemented: nontrivial designated initializers not supported
    SSH 连接 WSL
    移动硬盘变成 RAW 格式
    Linux 终端快捷键
    Adaptive AUTOSAR 学习笔记 1 概述、背景、AP CP 对比区别
    Qt扫盲篇
    Qt(C++)之实现风行播放器界面
    Qt之统一的UI界面格式基调,漂亮的UI界面
  • 原文地址:https://www.cnblogs.com/yanshw/p/11776108.html
Copyright © 2011-2022 走看看