zoukankan      html  css  js  c++  java
  • Pig join用法举例

    jnd = join a by f1, b by f2;

     
    join操作默认的是内连接,只有两边都匹配才会保留
     
    需要用null补位的那边需要知道它的模式:
    如果是左外连接,需要知道右边的数据集的模式,不匹配的字段用null补位
    如果是右外连接,需要知道左边的数据集的模式,不匹配的字段用null补位
    如果是全外连接,需要知道两边的数据集的模式,不匹配的字段用null补位
     
    触发reduce阶段
     
    基本用法
    a = load 'input1';
    b = load 'input2';
    jnd = join a by $0, b by $1;
    

       

    多字段连接
    a = load 'input1' as (username, age, city);
    b = load 'input2' as (orderid, user, city);
    jnd = join a by (username, city), b by (user, city);
    

       

    :: join后的字段引用
    a = load 'input1' as (username, age, address);
    b = load 'input2' as (orderid, user, money;
    jnd = join a by username, b by user;
    result = foreach jnd generate a::username, a::age, address, b::orderid;
    

       

    多数据集连接
    a = load 'input1' as (username, age);
    b = load 'input2' as (orderid, user);
    c = load 'input3' as (user, acount);
    jnd = join a by username, b by user, c by user;
    

       

    外连接 仅限两个数据集
    a = load 'input1' as (username, age);
    b = load 'input2' as (orderid, user);
    jnd = join a by username left outer, b by user;
    jnd = join a by username right, b by user;
    jnd = join a by username full, b by user;
    

      

    自连接 需要加载自身数据集两次,使用不同的别名
    a = load 'data' as (node, parentid, name);
    b = load 'data' as (node, parentid, name);
    jnd = join a by node, b by parentid;
    

      

     
     
  • 相关阅读:
    ROW_NUMBER()用法(转)
    winform MD5值生成
    VC使用“添加方法向导”添加调度映射方法“
    MyGeneration配置说明
    dataGridView取消自动生成列
    PHP魔术常量(magic constant)
    Eclipse添加DTD文件实现xml的自动提示功能
    Google:5个常见的SEO错误和6个SEO好主意
    PHP检查PEAR是否工作
    手把手教你在ubuntu上安装LAMP
  • 原文地址:https://www.cnblogs.com/lishouguang/p/4559602.html
Copyright © 2011-2022 走看看