zoukankan      html  css  js  c++  java
  • hive--distribute by and sort by

    数据

    B 10 store_B_4
    A 12 store_A_1
    A 14 store_A_2
    B 15 store_B_1
    B 19 store_B_2
    B 30 store_B_3

    建表及加载数据

    create table if not exists store(
    sid string,
    amount string,
    name string
    )
    row format delimited fields terminated by ' '
    lines terminated by '
    '
    stored as textfile
    ;
    load data local inpath '/opt/wangyuqi/store.txt' into table store;

    hive中 distribute by + 字段,关键字会控制map输出结果的分发,相同字段的map会分发到一个reduce节点,sort by 为每个reduce内部排序

    select * from store distribute by sid sort by amount desc;
    result:
    A    14    store_A_2
    A    12    store_A_1
    B    30    store_B_3
    B    19    store_B_2
    B    15    store_B_1
    B    10    store_B_4
    Time taken: 224.482 seconds

    cluster by用法:相当于 distribute by 和sort by 的结合,默认只能是升序

    select * from store cluster by sid;
    result:
    A    14    store_A_2
    A    12    store_A_1
    B    30    store_B_3
    B    19    store_B_2
    B    15    store_B_1
    B    10    store_B_4
    Time taken: 126.178 seconds, Fetched: 6 row(s)
  • 相关阅读:
    LeetCode
    LeetCode
    控制反转(Ioc)
    KMP算法
    *&m与m的区别
    函数指针与函数指针数组的使用方法
    C++四种类型转换
    内存分配:堆内存,栈内存
    汇编 基础
    i++,++i 作为参数
  • 原文地址:https://www.cnblogs.com/youchi/p/13551421.html
Copyright © 2011-2022 走看看