zoukankan      html  css  js  c++  java
  • Hive-02 DDL| DML

    类型转换

    可以使用CAST操作显示进行数据类型转换
    例如CAST('1' AS INT)将把字符串'1' 转换成整数1;如果强制类型转换失败,如执行CAST('X' AS INT),表达式返回空值 NULL
    0: jdbc:hive2://hadoop101:10000> select '1'+2, cast('1'as int) + 2; +------+------+--+ | _c0 | _c1 | +------+------+--+ | 3.0 | 3 |

    对于Hive的String类型相当于数据库的varchar类型,该类型是一个可变的字符串,不过它不能声明其中最多能存储多少个字符,理论上它可以存储2GB的字符数。

    集合数据类型

    Hive有三种复杂数据类型ARRAY、MAP 和 STRUCT。ARRAY和MAP与Java中的Array和Map类似,而STRUCT与C语言中的Struct类似,它封装了一个命名字段集合,复杂数据类型允许任意层次的嵌套。

    [kris@hadoop101 datas]$ vim test.txt
    songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
    yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:16, chao yang_beijing
    
    hive (default)> create table test(
                  > name string,
                  > friends array<string>,
                  > children map<string, int>,
                  > address struct<street:string, city:string>)
                  > row format delimited fields terminated by ','
                  > collection items terminated by '_'
                  > map keys terminated by ':'
                  > lines terminated by '
    ';
    OK
    Time taken: 0.249 seconds
    hive (default)> load data local inpath '/opt/module/datas/test.txt/' into table test;
    Loading data to table default.test
    Table default.test stats: [numFiles=1, totalSize=145]
    OK
    Time taken: 1.365 seconds
    0: jdbc:hive2://hadoop101:10000> select * from test;
    0: jdbc:hive2://hadoop101:10000> select friends[1], children['xiao song'], address.city from test where name="songsong";
    +-------+------+----------+--+
    |  _c0  | _c1  |   city   |
    +-------+------+----------+--+
    | lili  | 18   | beijing  |
    +-------+------+----------+--+
    1 row selected (0.321 seconds)

    DDL数据定义

    创建数据库

    创建一个数据库,数据库在HDFS上的默认存储路径是/user/hive/warehouse/*.db。

    修改

    用户可以使用ALTER DATABASE命令为某个数据库的DBPROPERTIES设置键-值对属性值,来描述这个数据库的属性信息。数据库的其他元数据信息都是不可更改的,包括数据库名和数据库所在的目录位置。

    ① 创建数据库
    0
    : jdbc:hive2://hadoop101:10000> create database if not exists db_hive; 避免要创建的数据库已经存在错误,增加if not exists判断。(标准写法) No rows affected (0.032 seconds) 0: jdbc:hive2://hadoop101:10000> create database if not exists db_hive2 location '/db_hive2.db'; 指定数据库在HDFS上存放的位置

    ② 修改数据库
    hive (db_hive)> alter database db_hive set dbproperties('createtime'='20190215');
    OK
    Time taken: 0.031 seconds
    ③ 查看数据库| 切换数据库 use xx; hive (db_hive)
    > desc database extended db_hive; 显示数据库详细信息; 也可以去掉extended即显示数据库信息; OK db_name comment location owner_name owner_type parameters db_hive hdfs://hadoop101:9000/user/hive/warehouse/db_hive.db kris USER {createtime=20190215} Time taken: 0.016 seconds, Fetched: 1 row(s) ④ 删除数据库 hive (db_hive)> drop database db_hive2; hive (db_hive)> drop database if exists db_hive2; hive (db_hive)> drop database db_hive cascade; ##若数据库不为空,则强制删除用cascade;

    创建表

    hive (default)> create table if not exists student2(
                  > id int, name string)
                  > row format delimited fields terminated by '	'
                  > stored as textfile
                  > location '/user/hive/warehouse/student2';
    OK

    管理表| 内部表

    管理表,有时也被称为内部表。因为这种表,Hive会(或多或少地)控制着数据的生命周期。Hive默认情况下会将这些表的数据存储在由配置项hive.metastore.warehouse.dir(例如,/user/hive/warehouse)所定义的目录的子目录下。   当我们删除一个管理表时,Hive也会删除这个表中数据。管理表不适合和其他工具共享数据。

    外部表,Hive并非认为其完全拥有这份数据。删除该表并不会删除掉这份数据,不过描述表的元数据信息会被删除掉。

    使用场景:每天将收集到的网站日志定期流入HDFS文本文件。在外部表(原始日志表)的基础上做大量的统计分析,用到的中间表、结果表使用内部表存储,数据通过SELECT+INSERT进入内部表。

    内部表数据可进可出 元数据+hdfs
    外部表元数据---HDFS,只包含元数据; 不会删hdfs数据

    ① 普通创建表
    hive (default)> create table if not exists student3 as select id, name from student; hive (default)> create table if not exists student4 like student; //根据已经存在的表机构创建表 hive (default)> desc formatted student2; #查询表的类型;查看格式化数据 OK col_name data_type comment ② 外部表 hive (default)> dfs -mkdir /student; hive (default)> dfs -put /opt/module/datas/student.txt /student; hive (default)> create external table stu_external( //创建外部表 id int, name string) row format delimited fields terminated by ' ' location '/student';
    0: jdbc:hive2://hadoop101:10000> select * from stu_external; 0: jdbc:hive2://hadoop101:10000> desc formatted stu_external; Table Type: | EXTERNAL_TABLE 0: jdbc:hive2://hadoop101:10000> drop table stu_external; 外部表删除后,hdfs中的数据还在,但是metadata中stu_external的元数据已被删除
    ③ 内部表和外部表的互相转换
    desc formatted student2; Table Type: | MANAGED_TABLE 0: jdbc:hive2://hadoop101:10000> alter table student2 set tblproperties('EXTERNAL'='TRUE'); Table Type: | EXTERNAL_TABLE 0: jdbc:hive2://hadoop101:10000> alter table student2 set tblproperties('EXTERNAL'='FALSE');

    分区表

    分区表实际上就是对应一个HDFS文件系统上的独立的文件夹,该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录,把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区,这样的查询效率会提高很多。

    ① 创建分区表
    hive (default)> create table dept_partition( > deptno int, dname string, loc string) > partitioned by (month string) > row format delimited fields terminated by ' '; OK
      加载数据 hive (
    default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201709'); Loading data to table default.dept_partition partition (month=201709) Partition default.dept_partition{month=201709} stats: [numFiles=1, numRows=0, totalSize=71, rawDataSize=0] OK load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201708'); load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201707'); ② 单分区查询 0: jdbc:hive2://hadoop101:10000> select * from dept_partition where month='201708'; +------------------------+-----------------------+---------------------+-----------------------+--+ | dept_partition.deptno | dept_partition.dname | dept_partition.loc | dept_partition.month | +------------------------+-----------------------+---------------------+-----------------------+--+ | 10 | ACCOUNTING | 1700 | 201708 | | 20 | RESEARCH | 1800 | 201708 | | 30 | SALES | 1900 | 201708 | | 40 | OPERATIONS | 1700 | 201708 | +------------------------+-----------------------+---------------------+-----------------------+--   多分区联合查询 0: jdbc:hive2://hadoop101:10000> select * from dept_partition where month='201707' 0: jdbc:hive2://hadoop101:10000> union 0: jdbc:hive2://hadoop101:10000> select * from dept_partition where month='201708' 0: jdbc:hive2://hadoop101:10000> union 0: jdbc:hive2://hadoop101:10000> select * from dept_partition where month='201709';

    ③ 增加分区| 增加单个、增加多个分区
    0: jdbc:hive2://hadoop101:10000> alter table dept_partition add partition(month='201705') partition(month='201704');

    ④ 删除分区| 单个、删多个用,连接
    0: jdbc:hive2://hadoop101:10000> alter table dept_partition drop partition(month='201705'), partition(month='201706');

    ⑤ 查看分区有多少分区
    0: jdbc:hive2://hadoop101:10000> show partitions dept_partition; +---------------+--+ | partition | +---------------+--+ | month=201707 | | month=201708 | | month=201709 | +---------------+--+

    ⑥ 查看分区表结构 0: jdbc:hive2://hadoop101:10000> desc formatted dept_partition;

    ⑦ 创建二级分区

      hive (default)> create table dept_partition2(
        deptno int, dname string, loc string)
        partitioned by (month string, day string)
        row format delimited fields terminated by ' ';

      加载数据到二级分区
    0: jdbc:hive2://hadoop101:10000> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition2 partition(month='201709', day='13');
    0: jdbc:hive2://hadoop101:10000> select * from dept_partition2 where month='201709' and day='13'; 查看分区数据

      把数据直接上传到分区目录上,让分区表和数据产生关联的三种方式

    方式一:上传数据后修复
    0
    : jdbc:hive2://hadoop101:10000> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=12; 0: jdbc:hive2://hadoop101:10000> dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201709/day=12; 0: jdbc:hive2://hadoop101:10000> msck repair table dept_partition2; //修复下才能查到数据 No rows affected (0.15 seconds) 0: jdbc:hive2://hadoop101:10000> select * from dept_partition2 where month='201709' and day='12'; alter table dept_partition2 drop partition(month='201709', day='11'); 删除 方式二:上传数据后添加分区 0: jdbc:hive2://hadoop101:10000> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=11; 不能加引号 0: jdbc:hive2://hadoop101:10000> dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201709/day=11; 0: jdbc:hive2://hadoop101:10000> alter table dept_partition2 add partition(month='201709', day='11'); 0: jdbc:hive2://hadoop101:10000> select * from dept_partition2 where month='201709' and day='11'; 方式三:创建文件夹后load数据到分区 0: jdbc:hive2://hadoop101:10000> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month='201709'/day='10'; 0: jdbc:hive2://hadoop101:10000> load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201709',day='10'); 0: jdbc:hive2://hadoop101:10000> select * from dept_partition2 where month='201709' and day='10';

    修改表

    重命名表
     jdbc:hive2://hadoop101:10000> alter table teacher rename to new_teacher;
    添加列
    0: jdbc:hive2://hadoop101:10000> alter table dept_partition add columns(deptdesc string);
    更新列
    0: jdbc:hive2://hadoop101:10000> alter table dept_partition change column deptdesc desc int; No rows affected (0.112 seconds) 0: jdbc:hive2://hadoop101:10000> desc dept_partition;
    替换列
    0: jdbc:hive2://hadoop101:10000> alter table dept_partition replace columns(deptid int, name string, loc string);
    删除表
    0: jdbc:hive2://hadoop101:10000> drop table new_teacher;

    DML数据操作

    数据导入

    向表中装载数据(Load)

    ① 向表中装载数据: 
      从本地到hive
    0
    : jdbc:hive2://hadoop101:10000> create table student(id int, name string) row format delimited fields terminated by ' '; 0: jdbc:hive2://hadoop101:10000> load data local inpath '/opt/module/datas/student.txt' into table default.student; 加载本地文件到hive   从HDFS到hive 0: jdbc:hive2://hadoop101:10000> dfs -mkdir -p /user/kris/hive; 0: jdbc:hive2://hadoop101:10000> dfs -put /opt/module/datas/student.txt /user/kris/hive; 0: jdbc:hive2://hadoop101:10000> load data inpath '/user/kris/hive/student.txt' into table default.student; //移动hdfs上的文件;加载HDFS上的数据 0: jdbc:hive2://hadoop101:10000> load data inpath '/user/kris/hive/student.txt' overwrite into table default.student; 加载数据覆盖表中已有的数据

    ② 通过查询语句向表中插入数据Insert
    create table student(id int, name string) partitioned by (month string) row format delimited fields terminated by ' '; 创建一张分区表 0: jdbc:hive2://hadoop101:10000> insert into table student partition(month='201902') values (1, "kris"), (2, "egon"); 插入数据 -rwxrwxr-x kris supergroup 14 B 2019/2/15 下午7:16:26 3 128 MB 000000_0   根据单张表查询结果来插入insert into是追加数据的方式插入表或分区,原有数据不会被删除;

                  insert overwrite是会覆盖表或分区中已有数据;
    0: jdbc:hive2://hadoop101:10000> insert overwrite table student partition(month="201905") select id,name from student where month='201902'; 在原本基础上追加 0: jdbc:hive2://hadoop101:10000> select * from student; +-------------+---------------+----------------+--+ | student.id | student.name | student.month | +-------------+---------------+----------------+--+ | 1 | kris | 201902 | | 2 | egon | 201902 | | 1 | kris | 201905 | | 2 | egon | 201905 | +-------------+---------------+----------------+--+   多表查询结果插入 hive (default)> from student insert overwrite table student partition(month="201904") > select id, name where month="201905" > insert overwrite table student partition(month="201903") > select id, name where month="201905"; 0: jdbc:hive2://hadoop101:10000> select * from student; +-------------+---------------+----------------+--+ | student.id | student.name | student.month | +-------------+---------------+----------------+--+ | 1 | kris | 201902 | | 2 | egon | 201902 | | 1 | kris | 201903 | | 2 | egon | 201903 | | 1 | kris | 201904 | | 2 | egon | 201904 | | 1 | kris | 201905 | | 2 | egon | 201905 | +-------------+---------------+----------------+-
    ③ 查询语句中创建并加载数据 AS Select
    create table if not exists student3 as select id, name from student;
    create table if not exists student4 like student; 
    ④ 创建表时通过Location指定加载数据路径
    0: jdbc:hive2://hadoop101:10000> create external table if not exists stu(id int, name string) row format delimited fields terminated by '	' location '/student';
    
    ⑤ Import数据到指定Hive表中;要先使用export导出后,才能将数据import导入

      export table student to '/hive_data/student';
      import table student from '/hive_data/student';

    create table student22(
    id int, name string)
    partitioned by (month string)
    row format delimited fields terminated by '	';
    
    import table student22 partition(month='201904') from  //student22必须要有分区才能导入成功
     '/user/hive/warehouse/export/student';

    数据导出(Impala都不支持)

    ① Insert导出
      将输出文件导出到本地/opt/module/datas/export/student中; 0: jdbc:hive2://hadoop101:10000> insert overwrite local directory '/opt/module/datas/export/student' select * from student;   结果格式化导出到本地
    hive (
    default)> insert overwrite local directory '/opt/module/datas/export/student1' > ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' select * from student; 结果导出到HDFS;只能用overwrite,不能用into hive (default)> insert overwrite directory '/user/kris/student2' > row format delimited fields terminated by ' ' > select * from student; ② Hadoop命令导出到本地 hive (default)> dfs -get /user/hive/warehouse/student/month=201902/000000_0 /opt/module/datas/export/student3.txt; [kris@hadoop101 export]$ cat student3.txt 1 kris 2 egon [kris@hadoop101 export]$ pwd /opt/module/datas/export ③ Shell命令导出到本地 [kris@hadoop101 hive]$ bin/hive -e 'select * from default.student;' > /opt/module/datas/export/student4.txt ④ Export导出到HDFS上 hive (default)> export table default.student to '/user/hive/warehouse/export/student';

    ⑤ Sqoop导出(导入)
      https://www.cnblogs.com/shengyang17/p/10512510.html

    Hive表导出成csv文件

    hive -e "
    set hive.cli.print.header=true; 
    select * from student where sex = 'male';
    " | sed 's/[	]/,/g'  > /opt/module/student.csv

    清除表中数据(Truncate)

    注意:Truncate只能删除管理表,不能删除外部表中数据

      hive (default)> truncate table student;

  • 相关阅读:
    Ruby
    WebGL的第二个小程序
    wegGL的第一个小程序
    Node.js介绍
    接口隔离原则(Interface Sepreation Principle)
    参数
    字段/属性
    接口和抽象类
    javascript中的事件
    线性回归算法-4.多元线性回归算法
  • 原文地址:https://www.cnblogs.com/shengyang17/p/10386327.html
Copyright © 2011-2022 走看看