zoukankan      html  css  js  c++  java
  • Pig join用法举例

    jnd = join a by f1, b by f2;

     
    join操作默认的是内连接,只有两边都匹配才会保留
     
    需要用null补位的那边需要知道它的模式:
    如果是左外连接,需要知道右边的数据集的模式,不匹配的字段用null补位
    如果是右外连接,需要知道左边的数据集的模式,不匹配的字段用null补位
    如果是全外连接,需要知道两边的数据集的模式,不匹配的字段用null补位
     
    触发reduce阶段
     
    基本用法
    a = load 'input1';
    b = load 'input2';
    jnd = join a by $0, b by $1;
    

       

    多字段连接
    a = load 'input1' as (username, age, city);
    b = load 'input2' as (orderid, user, city);
    jnd = join a by (username, city), b by (user, city);
    

       

    :: join后的字段引用
    a = load 'input1' as (username, age, address);
    b = load 'input2' as (orderid, user, money;
    jnd = join a by username, b by user;
    result = foreach jnd generate a::username, a::age, address, b::orderid;
    

       

    多数据集连接
    a = load 'input1' as (username, age);
    b = load 'input2' as (orderid, user);
    c = load 'input3' as (user, acount);
    jnd = join a by username, b by user, c by user;
    

       

    外连接 仅限两个数据集
    a = load 'input1' as (username, age);
    b = load 'input2' as (orderid, user);
    jnd = join a by username left outer, b by user;
    jnd = join a by username right, b by user;
    jnd = join a by username full, b by user;
    

      

    自连接 需要加载自身数据集两次,使用不同的别名
    a = load 'data' as (node, parentid, name);
    b = load 'data' as (node, parentid, name);
    jnd = join a by node, b by parentid;
    

      

     
     
  • 相关阅读:
    习题8-8 判断回文字符串
    Field笔记
    Object类中的方法
    字符和字节的区别
    Layui搜索设置每次从第一页开始
    Springboot+Jpa+Layui使用Pageable工具进行数据分页
    Map<String, Object>返回类型
    List集合中剔除重复的数据
    Springboot+Mybatis(Mysql 、Oracle) 注解和使用Xml两种方式批量添加数据
    MySql中group_concat函数的使用
  • 原文地址:https://www.cnblogs.com/lishouguang/p/4559602.html
Copyright © 2011-2022 走看看