zoukankan      html  css  js  c++  java
  • pig flatten

    今天通过不断的尝试,终于知道这个flatten的用法了。其实吧,有时候关键是要test,才能充分理解解说。不过,同事给说的有点问题,误导了我。整的我一直没明白怎么回事。

    这是官方的解释:

    The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and result is different for each type of structure.

    For tuples, flatten substitutes the fields of a tuple in place of the tuple. For example, consider a relation that has a tuple of the form (a, (b, c)). The expression GENERATE $0, flatten($1), will cause that tuple to become (a, b, c).

    For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen. For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e).

    我试验下来也是这样的,我今天把第一种和第二种情况都尝试了,实验证明,即使是第二种,其实一次flatten就够了,就得到schema了。这样的数据,

    Joe {(Joe,18,3.8)}
    Bill {(Bill,20,3.9)}
    John {(John,18,4.0)}
    Mary {(Mary,19,3.8),(Mary,19,5.0)}

    a = load 'result' as (f1:chararray,B: bag {T: tuple(t1:chararray, t2:int, t3:float)});

    b = foreach a GENERATE FLATTEN(B) as (t1:chararray,t2:int,t3:float);

    这个是可以一次性flatten的。但是更高的复杂度我每测试,应该是需要两次这种操作的吧。真是真是对bag, tuple也长了见识了。明天看看能否把数据传输到UDF中操作。


    总结一句话,在不确定时要首先看官方文档,然后就先拿小数据测试一下,看看每一步得到的是什么结构describe,同时store后看看是什么结果,是否和自己想的一样。整体来说还是很清晰的。

  • 相关阅读:
    零碎
    Python学习 day03 (续day02)
    Python学习 day02
    Python学习 Day1
    线性表——顺序表
    纠删码简介
    小数转化为分数
    C语言多线程操作
    转载:RAMCloud
    转载:全球级分布式数据库Google Spanner
  • 原文地址:https://www.cnblogs.com/jamesf/p/4751605.html
Copyright © 2011-2022 走看看