zoukankan      html  css  js  c++  java
  • pig trial-group,foreach

    A = load '/user/cloudera/lab/mydata' using PigStorage() as (a,b,c);

    如果写成 A=load 就会出现  [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "A=load "" at line 1, column 1.

    ​(1,2,3)

    (4,2,1)

    (8,3,4)

    (4,3,3)

    (7,2,5)

    (8,4,3)

    B = group A by a;​

    (1,{(1,2,3)})

    (4,{(4,3,3),(4,2,1)})

    (7,{(7,2,5)})

    (8,{(8,4,3),(8,3,4)})

    C = foreach B { D = distinct A.b; generate flatten(group), COUNT(D); };

    把"("写成中文"( " 会报错  Unexpected character '.

    B的第一个字段有固定的名字,叫group,因为它是由group操作生成的。

    上面语句中D = distinct A.b;       A 指 B的第二个字段,保留生成B的时候 relation的名字,这里是以下值 

    (1,2,3)

    (4,3,3), (4,2,1)

    (7,2,5)

    (8,4,4), (8,3,4)

    所以 D 每次是

    2

    3,2

    2

    4,3

    >> generate flatten(group), COUNT(D);

    (1,1)

    (4,2)

    (7,1)

    (8,2)

    =========================

    GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples

    The first field is named "group" (do not confuse this with the GROUP operator) and is the same type as the group key.  

    The second field takes the name of the original relation and is type bag.

    # so "group" is the key name, and "A or B the original alias" is the nested set name

  • 相关阅读:
    一个提高N倍系统新能的编程点,却总是被普通开发们遗忘
    工作不到一年,做出了100k系统,老板给我升职加薪
    offer收割机也有方法论
    最长公共前缀
    罗马数字转整数
    回文数
    整数反转
    两数之和
    网页中Office和pdf相关文件导出
    搭建一个低配版的Mock Server
  • 原文地址:https://www.cnblogs.com/bob-dong/p/14248211.html
Copyright © 2011-2022 走看看