zoukankan      html  css  js  c++  java
  • Pig join用法举例

    jnd = join a by f1, b by f2;

     
    join操作默认的是内连接,只有两边都匹配才会保留
     
    需要用null补位的那边需要知道它的模式:
    如果是左外连接,需要知道右边的数据集的模式,不匹配的字段用null补位
    如果是右外连接,需要知道左边的数据集的模式,不匹配的字段用null补位
    如果是全外连接,需要知道两边的数据集的模式,不匹配的字段用null补位
     
    触发reduce阶段
     
    基本用法
    a = load 'input1';
    b = load 'input2';
    jnd = join a by $0, b by $1;
    

       

    多字段连接
    a = load 'input1' as (username, age, city);
    b = load 'input2' as (orderid, user, city);
    jnd = join a by (username, city), b by (user, city);
    

       

    :: join后的字段引用
    a = load 'input1' as (username, age, address);
    b = load 'input2' as (orderid, user, money;
    jnd = join a by username, b by user;
    result = foreach jnd generate a::username, a::age, address, b::orderid;
    

       

    多数据集连接
    a = load 'input1' as (username, age);
    b = load 'input2' as (orderid, user);
    c = load 'input3' as (user, acount);
    jnd = join a by username, b by user, c by user;
    

       

    外连接 仅限两个数据集
    a = load 'input1' as (username, age);
    b = load 'input2' as (orderid, user);
    jnd = join a by username left outer, b by user;
    jnd = join a by username right, b by user;
    jnd = join a by username full, b by user;
    

      

    自连接 需要加载自身数据集两次,使用不同的别名
    a = load 'data' as (node, parentid, name);
    b = load 'data' as (node, parentid, name);
    jnd = join a by node, b by parentid;
    

      

     
     
  • 相关阅读:
    什么是垃圾回收
    Oracle GoldenGate学习之Goldengate介绍
    JVM虚拟机选项:Xms Xmx PermSize MaxPermSize区别
    查看linux系统版本命令
    Case when 的用法,简单Case函数
    case when then else end
    ORACLE视图添加备注
    修改 tomcat 内存
    Linux 内存机制详解宝典
    oracle正则表达式regexp_like的用法详解
  • 原文地址:https://www.cnblogs.com/lishouguang/p/4559602.html
Copyright © 2011-2022 走看看