hive--distribute by and sort by

数据

B 10 store_B_4
A 12 store_A_1
A 14 store_A_2
B 15 store_B_1
B 19 store_B_2
B 30 store_B_3

建表及加载数据

create table if not exists store(
sid string,
amount string,
name string
)
row format delimited fields terminated by ' '
lines terminated by '
'
stored as textfile
;
load data local inpath '/opt/wangyuqi/store.txt' into table store;

hive中 distribute by + 字段，关键字会控制map输出结果的分发，相同字段的map会分发到一个reduce节点，sort by 为每个reduce内部排序

select * from store distribute by sid sort by amount desc;
result：
A    14    store_A_2
A    12    store_A_1
B    30    store_B_3
B    19    store_B_2
B    15    store_B_1
B    10    store_B_4
Time taken: 224.482 seconds

cluster by用法：相当于 distribute by 和sort by 的结合，默认只能是升序

select * from store cluster by sid;
result：
A    14    store_A_2
A    12    store_A_1
B    30    store_B_3
B    19    store_B_2
B    15    store_B_1
B    10    store_B_4
Time taken: 126.178 seconds, Fetched: 6 row(s)

查看全文

相关阅读:
OSG-提示“error reading file e:1.jpg file not handled”
OSG-加载地球文件报0x00000005错误，提示error reading file simple.earth file not handled
QT-找开工程后，最上方提示the code model could not parse an included file, which might lead to incorrect code completion and highlighting, for example.
我的书《Unity3D动作游戏开发实战》出版了
 java中无符号类型的第三方库jOOU
Windows批处理备份mysql数据
 使用 DevTools 时，通用Mapper经常会出现 class x.x.A cannot be cast to x.x.A
Java版本,Java版本MongoDB驱动,驱动与MongoDB数据库,Spring之间的兼容性
 Jrebel本地激活方法
 wget下载指定网站目录下的所有内容

原文地址：https://www.cnblogs.com/youchi/p/13551421.html