zoukankan      html  css  js  c++  java
  • Pig join用法举例

    jnd = join a by f1, b by f2;

     
    join操作默认的是内连接,只有两边都匹配才会保留
     
    需要用null补位的那边需要知道它的模式:
    如果是左外连接,需要知道右边的数据集的模式,不匹配的字段用null补位
    如果是右外连接,需要知道左边的数据集的模式,不匹配的字段用null补位
    如果是全外连接,需要知道两边的数据集的模式,不匹配的字段用null补位
     
    触发reduce阶段
     
    基本用法
    a = load 'input1';
    b = load 'input2';
    jnd = join a by $0, b by $1;
    

       

    多字段连接
    a = load 'input1' as (username, age, city);
    b = load 'input2' as (orderid, user, city);
    jnd = join a by (username, city), b by (user, city);
    

       

    :: join后的字段引用
    a = load 'input1' as (username, age, address);
    b = load 'input2' as (orderid, user, money;
    jnd = join a by username, b by user;
    result = foreach jnd generate a::username, a::age, address, b::orderid;
    

       

    多数据集连接
    a = load 'input1' as (username, age);
    b = load 'input2' as (orderid, user);
    c = load 'input3' as (user, acount);
    jnd = join a by username, b by user, c by user;
    

       

    外连接 仅限两个数据集
    a = load 'input1' as (username, age);
    b = load 'input2' as (orderid, user);
    jnd = join a by username left outer, b by user;
    jnd = join a by username right, b by user;
    jnd = join a by username full, b by user;
    

      

    自连接 需要加载自身数据集两次,使用不同的别名
    a = load 'data' as (node, parentid, name);
    b = load 'data' as (node, parentid, name);
    jnd = join a by node, b by parentid;
    

      

     
     
  • 相关阅读:
    梅花雨控件使用时注意的...
    利用XML实现通用WEB报表打印(实现篇)
    Improve performance using ADO.NET 2.0 batch update feature
    hook
    owc11生成饼状图
    PHP数组合并:[“+”运算符]、[array_merge]、[array_merge_recursive]区别
    PHP中使用函数array_merge()合并数组
    WCF 第四章 绑定
    WCF 第四章 绑定 跨机器通信
    WCF 第六章 序列化与编码 系列文章
  • 原文地址:https://www.cnblogs.com/lishouguang/p/4559602.html
Copyright © 2011-2022 走看看