zoukankan      html  css  js  c++  java
  • hive中的join

    建表

    0: jdbc:hive2://localhost:10000> create database myjoin;
    No rows affected (3.78 seconds)
    0: jdbc:hive2://localhost:10000> use myjoin;
    No rows affected (0.419 seconds)
    0: jdbc:hive2://localhost:10000> create table a(id int,name string) row format delimited fields terminated by ',';
    No rows affected (2.08 seconds)
    0: jdbc:hive2://localhost:10000> create table b(id int,name string) row format delimited fields terminated by ',';
    0: jdbc:hive2://localhost:10000> select * from a
    0: jdbc:hive2://localhost:10000> ;
    +-------+---------+--+
    | a.id  | a.name  |
    +-------+---------+--+
    | 1     | qq      |
    | 2     | ww      |
    | 3     | ee      |
    | 4     | rr      |
    | 5     | tt      |
    | 6     | yy      |
    | 7     | aa      |
    | 8     | ss      |
    | 11    | zz      |
    +-------+---------+--+
    9 rows selected (1.881 seconds)
    0: jdbc:hive2://localhost:10000> select * from b;
    +-------+---------+--+
    | b.id  | b.name  |
    +-------+---------+--+
    | 1     | qq      |
    | 2     | 22      |
    | 3     | dd      |
    | 4     | rr      |
    | 6     | fgf     |
    | 7     | as      |
    | 9     | 23      |
    | 12    | ww      |
    | 34    | 3       |
    | 23    | 34      |
    | 12    | 45      |
    | 26    | 4r      |
    +-------+---------+--+
    12 rows selected (0.147 seconds)
    inner join 的结果,也就是join
    0
    : jdbc:hive2://localhost:10000> select a.*,b.* from a inner join b on a.id = b.id; INFO : Execution completed successfully INFO : MapredLocal task succeeded INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:1 INFO : Submitting tokens for job: job_1496277833427_0007 INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0007/ INFO : Starting Job = job_1496277833427_0007, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0007/ INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0007 INFO : Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 INFO : 2017-06-01 16:32:03,138 Stage-3 map = 0%, reduce = 0% INFO : 2017-06-01 16:32:26,221 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 5.05 sec INFO : MapReduce Total cumulative CPU time: 5 seconds 50 msec INFO : Ended Job = job_1496277833427_0007 +-------+---------+-------+---------+--+ | a.id | a.name | b.id | b.name | +-------+---------+-------+---------+--+ | 1 | qq | 1 | qq | | 2 | ww | 2 | 22 | | 3 | ee | 3 | dd | | 4 | rr | 4 | rr | | 6 | yy | 6 | fgf | | 7 | aa | 7 | as | +-------+---------+-------+---------+--+

    full outer join ,两边的数据都会出来只不过on条件没有对应上的一端会显示为null

    0: jdbc:hive2://localhost:10000> select a.*,b.* from a full outer join b on a.id = b.id;
    INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
    INFO  : In order to change the average load for a reducer (in bytes):
    INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
    INFO  : In order to limit the maximum number of reducers:
    INFO  :   set hive.exec.reducers.max=<number>
    INFO  : In order to set a constant number of reducers:
    INFO  :   set mapreduce.job.reduces=<number>
    INFO  : number of splits:2
    INFO  : Submitting tokens for job: job_1496277833427_0008
    INFO  : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0008/
    INFO  : Starting Job = job_1496277833427_0008, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0008/
    INFO  : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job  -kill job_1496277833427_0008
    INFO  : Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
    INFO  : 2017-06-01 16:34:05,413 Stage-1 map = 0%,  reduce = 0%
    INFO  : 2017-06-01 16:35:05,889 Stage-1 map = 0%,  reduce = 0%
    INFO  : 2017-06-01 16:37:35,521 Stage-1 map = 0%,  reduce = 0%
    INFO  : 2017-06-01 16:38:46,061 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 6.52 sec
    INFO  : 2017-06-01 16:38:49,443 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 9.17 sec
    INFO  : 2017-06-01 16:39:25,252 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 12.65 sec
    INFO  : MapReduce Total cumulative CPU time: 12 seconds 650 msec
    INFO  : Ended Job = job_1496277833427_0008
    +-------+---------+-------+---------+--+
    | a.id  | a.name  | b.id  | b.name  |
    +-------+---------+-------+---------+--+
    | 1     | qq      | 1     | qq      |
    | 2     | ww      | 2     | 22      |
    | 3     | ee      | 3     | dd      |
    | 4     | rr      | 4     | rr      |
    | 5     | tt      | NULL  | NULL    |
    | 6     | yy      | 6     | fgf     |
    | 7     | aa      | 7     | as      |
    | 8     | ss      | NULL  | NULL    |
    | NULL  | NULL    | 9     | 23      |
    | 11    | zz      | NULL  | NULL    |
    | NULL  | NULL    | 12    | 45      |
    | NULL  | NULL    | 12    | ww      |
    | NULL  | NULL    | 23    | 34      |
    | NULL  | NULL    | 26    | 4r      |
    | NULL  | NULL    | 34    | 3       |
    +-------+---------+-------+---------+--+
    15 rows selected (371.304 seconds)

    select a.*from a left semi join b on a.id = b.id; -- from 前不能写b.* 否则会报错( Error while compiling statement: FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b' (state=42000,code=10009))

    替代exist in 的用法,返回值只是inner join 中左边的一般,

    +-------+---------+--+
    | a.id  | a.name  |
    +-------+---------+--+
    | 1     | qq      |
    | 2     | ww      |
    | 3     | ee      |
    | 4     | rr      |
    | 6     | yy      |
    | 7     | aa      |
    +-------+---------+--+

    没有 right semi join

    left semi join 是exist in 的高效实现,比inner join 效率高

  • 相关阅读:
    Mybatis学习随笔3
    Mybatis学习随笔2
    Mybatis学习随笔
    Java校招面试-什么是线程安全/不安全
    装饰器2
    装饰器
    默认传参的陷阱
    处理日志文件
    第二天
    用户登录
  • 原文地址:https://www.cnblogs.com/rocky-AGE-24/p/6929636.html
Copyright © 2011-2022 走看看