zoukankan      html  css  js  c++  java
  • pig trial-group,foreach

    A = load '/user/cloudera/lab/mydata' using PigStorage() as (a,b,c);

    如果写成 A=load 就会出现  [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "A=load "" at line 1, column 1.

    ​(1,2,3)

    (4,2,1)

    (8,3,4)

    (4,3,3)

    (7,2,5)

    (8,4,3)

    B = group A by a;​

    (1,{(1,2,3)})

    (4,{(4,3,3),(4,2,1)})

    (7,{(7,2,5)})

    (8,{(8,4,3),(8,3,4)})

    C = foreach B { D = distinct A.b; generate flatten(group), COUNT(D); };

    把"("写成中文"( " 会报错  Unexpected character '.

    B的第一个字段有固定的名字,叫group,因为它是由group操作生成的。

    上面语句中D = distinct A.b;       A 指 B的第二个字段,保留生成B的时候 relation的名字,这里是以下值 

    (1,2,3)

    (4,3,3), (4,2,1)

    (7,2,5)

    (8,4,4), (8,3,4)

    所以 D 每次是

    2

    3,2

    2

    4,3

    >> generate flatten(group), COUNT(D);

    (1,1)

    (4,2)

    (7,1)

    (8,2)

    =========================

    GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples

    The first field is named "group" (do not confuse this with the GROUP operator) and is the same type as the group key.  

    The second field takes the name of the original relation and is type bag.

    # so "group" is the key name, and "A or B the original alias" is the nested set name

  • 相关阅读:
    ES6 数组下
    ES6 ---数组(上部分)
    ES6---函数
    ES6---字符串
    自适应网页设计(Responsive Web Design)
    HTML Meta中添加X-UA-Compatible和IE=Edge,chrome=1有什么作用?
    mysql潜在的危险kill
    shell脚本加密
    Linux安全知识总结
    nginx 解决400 bad request 的方法(转载)
  • 原文地址:https://www.cnblogs.com/bob-dong/p/14248211.html
Copyright © 2011-2022 走看看