zoukankan      html  css  js  c++  java
  • Hive 教程(七)-DML基础

    DML,Hive Data Manipulation Language,数据操作语言;

    通俗理解就是数据库里与数据的操作,如增删改查,统计汇总等;

    Loading files into tables

    把文件数据写入 table,load 操作不对数据做任何转换

    LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
    LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)

    比较好理解,这里只解释可选项:

    local:本地文件,如果上传本地文件,需注明 local,默认是 hdfs;

    overwrite:覆盖之前的数据,默认是 追加;

    partition:分区表加载数据,这个参数指定 load 到哪个分区;

    示例

    load data local inpath '/usr/lib/hive2.3.6/2.csv' into table student_p partition(part=888);

    Inserting data into Hive Tables from queries

    把子查询结果写入 table

    Standard syntax:
    INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
    INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

    注意这里是标准语法,还有 扩展语法,参见官网

    示例

    insert into table student_p partition(part=986) select id, name, sexex, age, dept from student_p;

    这里我把一个分区的数据加载到另一个分区,并没有 select *,因为 select * 的字段多一个 part/分区字段

    多插入模式

    把多个查询结果写入 table

    from student_p
    insert into table student_p partition(part=984) select id, name, sexex, age, dept from student_p where part=888
    insert into table student_p partition(part=983) select id, name, sexex, age, dept from student_p where part=888;

    把子查询的 form 提取出来了;

    把不同表查询出来的结果写入 table,没试过,自己可以试试

    分区 DML 注意事项

    1. load 和 insert 都会自动创建分区

    2. 分区表 写入数据都会启动 mr

    Writing data into the filesystem from queries

    把子查询的结果写入文件系统

    Standard syntax:
    INSERT OVERWRITE [LOCAL] DIRECTORY directory1
      [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
      SELECT ... FROM ...
     
    Hive extension (multiple inserts):
    FROM from_statement
    INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
    [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
     
      
    row_format
      : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
            [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
            [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)

    1. 只能是 overwrite

    2. 只能存入路径,文件名自动生成

    3. 存储的文件格式如果是 txt,写成  stored as textfile

    示例

    insert overwrite local directory '/usr/lib/hive2.3.6/3.txt' row format delimited fields terminated by '	' stored as orc select * from student_p;

    Inserting values into tables from SQL

    像普通数据库一样写入数据

    Standard Syntax:
    INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
      
    Where values_row is:
    ( value [, value ...] )
    where a value is either null or any valid SQL literal

    官网例子 - 分桶

    // 分桶表,在创建表时,可以指定每个桶的大小,2 BUCKETS,代表如果 hadoop 默认 block 为 64M 的话,每个桶大小为 128M
    CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
      CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
     
    INSERT INTO TABLE students
      VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
     
    // 分区表+分桶表,注意,桶的大小是 block 的 偶数倍,即使指定 3 BUCKETS,会自动转换成 4 BUCKETS 
    CREATE TABLE pageviews (userid VARCHAR(64), link STRING, came_from STRING)
      PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;
     
    INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')
      VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);
     
    INSERT INTO TABLE pageviews PARTITION (datestamp)
      VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');
      
    INSERT INTO TABLE pageviews
      VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');

    Update

    Updates can only be performed on tables that support ACID. See Hive Transactions for details.

    UPDATE tablename SET column = value [, column = value ...] [WHERE expression]

    不好用

    Delete

    Deletes can only be performed on tables that support ACID. See Hive Transactions for details.

    DELETE FROM tablename [WHERE expression]

    Merge

    Merge can only be performed on tables that support ACID. See Hive Transactions for details.

    Standard Syntax:
    MERGE INTO <target table> AS T USING <source expression/table> AS S
    ON <boolean expression1>
    WHEN MATCHED [AND <boolean expression2>] THEN UPDATE SET <set clause list>
    WHEN MATCHED [AND <boolean expression3>] THEN DELETE
    WHEN NOT MATCHED [AND <boolean expression4>] THEN INSERT VALUES<value list>

    其他

    As Select

    在建表的同时加载子查询的结果

    create table student3 as select id, name form student2;

    location

    在建表时通过 location 指定被加载的数据

    create table student4(id int, name string) location '/usr/hive/student4';

    具体怎么回事可以参考我的 DDL 介绍,不难理解

    import

    不常用,自己百度下吧

    参考资料:

    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML  官网

  • 相关阅读:
    JQuery插件,轻量级表单模型验证(续 二)
    关于DateTimeOffset的爱恨情仇
    JQuery插件,轻量级表单模型验证(续 一)
    asp.net 远程模型验证
    JQuery插件,轻量级表单模型验证
    Ef Core增加Sql方法
    初识依赖注入
    Core下简易WebApi
    sql语句实现行转列练习
    sql语句基本查询操作
  • 原文地址:https://www.cnblogs.com/yanshw/p/11790539.html
Copyright © 2011-2022 走看看