Hive- 表 - 走看看

zoukankan html css js c++ java

Hive- 表
在hive中表的类型：管理表和托管表（外部表）。

内部表也称之为MANAGER_TABLE,默认存储在/user/hive/warehouse下，也可以通过location指定；删除表时，会删除表的数据以及元数据；

外部表称之为EXTERNAL_TABLE。在创建表时可以自己指定目录位置（LOCATION），数据存储所在的目录；删除表时，只会删除元数据不会删除表数据；

创建外部表实例
create external table if not exists default.emp_ext( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int ) row format delimited fields terminated by ' ' location '/opt/input／emp';
分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所以的数据文件。hive中的分区就是分目录，把一个大的数据集根据业务需要分割成更小的数据集。

在查询时通过WHERE子句中的表达来选择所需要的指定的分区，这样的查询效率会提高很多。
create external table if not exists default.emp_partition( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int ) partitioned by(month string) row format delimited fields terminated by ' ';
分区表注意事项：

修复表：msck repair table table_name;

可以写shell脚本
dfs -mkdir -p /user/hive/warehouse/dept_part/day=20171025; dfs -put /opt/weblog/log.log /user/hive/warehouse/dept_part/day=20171025; alter table dept_part and partition('day=20171025');
查看表的分区数：show partitions dept_part;

导入数据进入hive表

load　data [local] inpath 'filepath' [overwrite] into table tablename　into　tablename [partition (partcol1=val,...)]；

参数带local意思是本地文件，不带就是HDFS文件

参数带overwrite意思是覆盖原本文件的内容，不带就追加内容

分区表加载，特殊性partition (partcol1=val,...)

1.加载本地文件到hive表
load data local inpath '/root/emp.txt' into table default.emp
2.加载hdfs文件到hive表中
load data inpath '/root/emp.txt' into table default.emp
3.加载数据覆盖表中已有的数据
load data inpath '/root/emp.txt' overwrite into table default.emp
4.创建表是通过insert加载
create　table default.emp_ci like emp; insert into table default.emp_ci select * from default.emp;
5.创建表的时候通过指定location指定加载

导出hive表数据
insert overwrite local directory '/opt/datas/hive/hive_exp_emp' select * from default.emp

row format delimited fields terminated by ' '; #bin/hive -e "select * from default.emp;" > /opt/datas/hive/exp_res.txt
hive表多重插入
假如有一个需求：
从t_4中筛选出不同的数据，插入另外两张表中；
insert overwrite table t_4_st_lt_200 partition(day='1') select ip,url,staylong from t_4 where staylong<200; insert overwrite table t_4_st_gt_200 partition(day='1') select ip,url,staylong from t_4 where staylong>200;
但是以上实现方式有一个弊端，两次筛选job，要分别启动两次mr过程，要对同一份源表数据进行两次读取
如果使用多重插入语法，则可以避免上述弊端，提高效率：源表只要读取一次即可
from t_4 insert overwrite table t_4_st_lt_200 partition(day='2') select ip,url,staylong where staylong<200 insert overwrite table t_4_st_gt_200 partition(day='2') select ip,url,staylong where staylong>200;
查看全文

相关阅读:
[实变函数]4.4 依测度收敛
 [实变函数]4.3 可测函数的构造
 [实变函数]4.2 Egrov 定理
 [实变函数]4.1 可测函数 (measurable function) 及其性质
 [实变函数]4.0 引言
 [实变函数]3.3 可测集类
 垂直滚动选择效果的实现
 unity模型任意无限切割插件
 微信小程序—智能小蜜（基于智能语义解析olami开放平台）
AdPlayBanner：功能丰富、一键式使用的图片轮播插件

原文地址：https://www.cnblogs.com/RzCong/p/7732590.html