zoukankan      html  css  js  c++  java
  • Pig 和 Hive 的表连接 Join

    Pig:

    1.  内连接:

     join_table = join table1 by col1,  join table2 by col2


    2. 外连接:

    左外连接:join_table = join table1 by col1 left  outer,  join table2 by col2     其中outer 可省略

    右外连接:join_table = join table1 by col1 right  outer,  join table2 by col2   其中outer 可省略

    全外连接:join_table = join table1 by col1 full  outer,  join table2 by col2


    3. 自连接
    不能直接进行: join table1 by col1,  join table1 by col1
    必须将数据加载两次,在进行内连接

    4. 分片-复制连接 replicated join

     join_table = join big_table  by col1 ,  join small_table by col2   using 'replicated'

    这样small_table 会加载到内存中复制到各个节点,再与big_table 连接。

    replicated join 只支持内连接和左外连接


    5. 倾斜数据连接 skew join

    join_table = join table1 by col1,  join table2 by col2 using 'skewed'

    先对table2的键值进行抽样,判断键值的分布情况,然后给各个Reducer分配键值范围,可能同一个键值被分配到好几个reducer中 


    6. 排好顺序的数据连接 merge join

    join_table = join table1 by col1,  join table2 by col2 using 'merge'


    7. 笛卡尔积
    cross table1, table2


    Hive:

    1. 内连接,可连接多个, 注意小表在前,大表在后

    select * from  table1 a  join table2  b on a.col1=b.col2


    2. 外连接

    左:select * from  table1 a  left outer join table2  b on a.col1=b.col2

    右:select * from  table1 a  right outer join table2  b on a.col1=b.col2

    全:select * from  table1 a  full outer join table2  b on a.col1=b.col2


    3. 左半开连接

    select  a.col1  , a.col2 , a.col3  from  table1 a  left semi join table2  b on a.col1=b.col2

    实现与内连接相同功能,但是比内连接高效。


    4.笛卡尔积

    select * from  table1  join table2   


    5.map-side join

    设置了 hive.auto.convert.join=true,   hive.mapjoin.smalltable.filesize=2500,  hive.optimize.bucketmapJoin=true 等参数会自动开启





  • 相关阅读:
    jchdl
    jchdl
    jchdl
    jchdl
    jchdl
    jchdl
    jchdl
    UVa 437 (变形的LIS) The Tower of Babylon
    UVa 1025 (动态规划) A Spy in the Metro
    UVa 10129 (并查集 + 欧拉路径) Play on Words
  • 原文地址:https://www.cnblogs.com/leeeee/p/7276133.html
Copyright © 2011-2022 走看看