1
)加载到普通表
-
-
加载本地文本文件内容(要与hive表字段分隔符顺序都要一致)
load data local inpath
'/home/hadoop/orders.csv'
overwrite into table orders;
1
> 如果数据源在HDFS上,则 load data inpath
'hdfs://master:9000/user/orders'
overwrite into table orders;
2
) 加载到分区表
load data local inpath
'/home/hadoop/test.txt'
overwrite into table test partition (dt
=
'2017-09-09'
);
1
> partition 是指定这批数据放入分区
2017
-
09
-
09
中;
3
)加载分桶表
-
-
先创建普通临时表
create table orders_tmp
(
user_id
int
,
user_name string,
create_time string
)
row
format
delimited fields terminated by
','
stored as textfile;
-
-
数据载入临时表
load data local inpath
'/home/hadoop/lead.txt'
overwrite into table orders_tmp;
-
-
导入分桶表
set
hive.enforce.bucketing
=
true;
insert overwrite table orders select
*
from
orders_tmp;
4
) 导出数据
-
-
导出数据,是将hive表中的数据导出到本地文件中;
insert overwrite local directory
'/home/hadoop/orders.bak2017-12-28'
select
*
from
orders;
【去掉local关键字,也可以导出到HDFS上】
5
)插入数据
-
-
insert select ; {}中的内容可选
insert overwrite table order_tmp {partition (dt
=
'2017-09-09'
)} select
*
from
orders;
-
-
一次遍历多次插入
from
orders
insert overwrite table log1 select company_id,original where company_id
=
'10086'
insert overwrite table log1 select company_id,original where company_id
=
'10000'
[每次hive查询,都会将数据集整个遍历一遍,当查询结果会插入过个表中时,可以采用以上语法,将一次遍历写入过个表,以达到提高效率的目的。]
6
)复制表
-
-
复制表是将源表的结构和数据复制并创建为一个新表,复制过程中,可以对数据进行筛选,列可以进行删减
create table order
row
format
delimited fields terminated by
' '
stored as textfile
as
select leader_id,order_id,
'2017-09-09'
as bakdate
from
orders
where create_time <
'2017-09-09'
;
[备份orders表中日期小于
2017
-
09
-
09
的内容到order中,选中了leader_id,order_id,添加了一个bakdate列]
7
)克隆表
-
-
只克隆源表的所有元数据,但是不复制源表的数据
create table orders like order;
8
)备份表
-
-
将orders_log数据备份到HDFS上命名为
/
user
/
hive
/
action_log.export;备份是备份表的元数据和数据内容
export table orders_log partition (dt
=
'2017-09-09'
) to
'/user/hive/action_log.export'
;
9
) 还原表
import
table orders_log
from
'/user/hive/action_log.export'
;