zoukankan      html  css  js  c++  java
  • HIVE 简单总结

    hive

    1 table

    查看 表
    show tables;
    查看表结构
    desc table_name;

    2 database 默认 default

    创建database
    create database_name;
    查看 database
    show databases;
    使用 database
    use database_name;
    删除 database
    drop database_name;


    3 托管表

    使用场景,数据的所有处理都由HIVE完成

    创建表托管表
    create table POI(id STRING,city STRING,name STRING,lon DOUBLE,lat DOUBLE,address STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
    载入数据(将本地数据移动到 hdfs:/user/hive/warehourse,注意关键字 LOCAL)
    LOAD DATA LOCAL INPATH '/tmp/hive.data' OVERWRITE INTO TABLE poi;
    查看数据
    select * from poi limit 10;
    select count(id) from poi;
    删除表
    drop table POI

    4 外部表
    使用场景,如果要用HIVE和其他工具来处理同一个数据集,应该使用外部表

    创建外部表 (创建 hdfs:/user/hive/external_talbe)
    create external table external_table (dummy STRING) LOCATION '/user/hive/external_talbe' ;
    载入数据 (将数据拷贝到 hdfs:/user/hive/external_talbe,)
    LOAD DATA LOCAL inpath '/tmp/dummy.txt' into table external_table;
    删除表(只会删除元数据,hdfs:/user/hive/external_talbe中数据无变化)
    drop table external

    5 分区
    对表进行划分,分区在创建表时由PARTITIONED BY 子句定义。

    创建表
    create table logs(ts BIGINT ,line STRING) PARTITIONED BY (dt STRING,country STRING);
    载入数据
    LOAD DATA LOCAL INPATH '/tmp/partition.data' INTO TABLE logs PARTITION (dt='2010-01-02',country='CN');
    查看数据
    select * from logs where dt='2010-01-01';

    文件存储结构
    /user/hive/warehouse/logs
    /user/hive/warehouse/logs/dt=2010-01-02
    /user/hive/warehouse/logs/dt=2010-01-02/country=CN
    /user/hive/warehouse/logs/dt=2010-01-02/country=CN/partition.data

    6 分桶
    获取更高的查询处理效率,map端连接
    取样更高效

    创建表
    create table bucketed_poi(id STRING,city STRING,name STRING,lon DOUBLE,lat DOUBLE,address STRING) clustered by (id) sorted by(id asc ) into 4 buckets;
    导入数据据
    insert overwrite table bucketed_poi select * from poi;
    取样
    hive> select * from bucketed_poi tablesample (bucket 1 out of 4 on id);
    hive> select * from bucketed_poi tablesample (bucket 1 out of 4 on rand());//未分桶

    文件存储结构

    /user/hive/warehouse/bucketed_poi
    /user/hive/warehouse/bucketed_poi/000000_0
    /user/hive/warehouse/bucketed_poi/000001_0
    /user/hive/warehouse/bucketed_poi/000002_0
    /user/hive/warehouse/bucketed_poi/000003_0

    7 多表插入

    FROM SOURCE
    INSERT OVERWRITE TABLE TARGET1
    SELECT C1,COUNT(C1), GROUPBY C1
    INSERT OVERWRITE TABLE TARGET2
    SELECT C2,COUNT(C2), GROUPBY C1
    INSERT OVERWRITE TABLE TARGET3
    SELECT C3,COUNT(C3), GROUPBY C3

    8 表修改
    重命名表,数据存放目录会改变
    ALTER TABLE SOURCE RENAME TO TARGET;
    添加列
    ALTER TABLE SOURCE ALL COLUMNS (C8 STRING);

    9 排序

    全排序,解决只有一个reduce
    FROM SOURCE
    SELECT year,temperature
    DISTRIBUTE BY year
    SORT BY year ASC,temperature DESC;

    10 连接

    内连接,HIVE 只支持等值连接、只允许在FROM中出现一个表
    SELECT a.* ,b.* FROM a join b on (a.bid = b.aid);
    外连接 左外连接,右外连接,全外连接
    SELECT a.* ,b.* FROM a LEFT|RIGTH|FULL OUTER JOIN b ON (a.bid = b.aid);
    半连接
    SELECT * FROM A LEFT SEMI JOIN B ON (A.id = b.id);
    类是于 IN
    SELECT * FROM A WHERE A.ID IN (SELECT id FROM B);

    11 子查询
    只支持子查询的出现在SELECT 语句的FROM 子句中

  • 相关阅读:
    Leetcode 191.位1的个数 By Python
    反向传播的推导
    Leetcode 268.缺失数字 By Python
    Leetcode 326.3的幂 By Python
    Leetcode 28.实现strStr() By Python
    Leetcode 7.反转整数 By Python
    Leetcode 125.验证回文串 By Python
    Leetcode 1.两数之和 By Python
    Hdoj 1008.Elevator 题解
    TZOJ 车辆拥挤相互往里走
  • 原文地址:https://www.cnblogs.com/zhangeamon/p/5780662.html
Copyright © 2011-2022 走看看