Hive分区与桶表

zoukankan html css js c++ java

Hive分区与桶表

1、分区

在hive中使用select查询一般会扫描整个表的内容，从而降低降低查询的效率。引入分区的概念，使得查询时只扫描表中关心的部分数据。

一个表中可以有一个或多个分区，每个分区以文件夹的形式单独存在表文件夹的目录下。

1.1分区建表分为单分区和双分区建表:

单分区建表语句：create table sample_table (id int, value string) partitioned by (age int) row format delimited fields terminated by ',' stored as textfile;;表中有id,value,age三列,以age分区

双分区建表语句：create table sample_table (id int, value string) partitioned by (age int, sex string) row format delimited fields terminated by ',' stored as textfile;;表中有id,value,age,sex四列，按照age和sex分区

【注：set hive.cli.print.current.db=true查看当前是什么数据库

row format delimited通过新的行将记录分开

fields terminated by ','各列之间以逗号隔开

stored as textfile存储为一个文本文件】

1.2添加数据：

load data local inpath ‘路径’ overwrite into table 表名 partition (分区名=’某值’)

【注：overwrite意味着表中原来的数据会被删除】

2、桶（Bucket）

分桶其实就是把大表化成了“小表”，然后 Map-Side Join 解决之，这是用来解决大表与小表之间的连接问题。将桶中的数据按某列进行排序会提高查询效率。

2.1创建带桶的table：

Create table 表名(id int,name string) clustered by (id) sorted by(name) into 4 buckets row format delimited fields terminated by ' ' stored as textfile; ;

2.2设置环境变量：

set hive.enforce.bucketing = true，使得Hive 就知道用表定义中声明的数量来创建桶

2.3插入数据：

insert table 桶表名 select * from 表名;

查看全文

相关阅读:
【bootstrapV3】移动端和PC端的滚动监听
 【TP3.2】TP3.2的 FIND_IN_SET()的用法
 【jquery】多日期选择插件easyui date
【onethink1.0】HTML模板获取前台和后台当前登录用户名
 【apache】phpstudy中apache 隐藏入口文件index.php （解决no input file specified错误）
【PHP+JS】uploadify3.2 和 Ueditor 修改上传文件大小！！
【原创+亲测可用】JS如何区分微信浏览器、QQ浏览器和QQ内置浏览器
 【JS】移动端好用的分享插件 soshm.js
【TP3.2.X】linux环境下TP3.2.X的各个目录权限
 【php+微擎】微擎学习相关帮助推荐

原文地址：https://www.cnblogs.com/chenyaling/p/5575386.html