zoukankan      html  css  js  c++  java
  • pig 介绍与pig版 hello world

    前两天使用pig做ETL,粗浅的看了一下,没有系统地学习,感觉pig还是值得学习的,故又重新看programming pig.

    以下是看的第一章的笔记:

    What is pig?

    Pig provides an engine for executing data flows in parallel on Hadoop. It includes a

    language, Pig Latin, for expressing these data flows. Pig Latin includes operators for

    many of the traditional data operations (join, sort, filter, etc.), as well as the ability for

    users to develop their own functions for reading, processing, and writing data.

     Pig runs on Hadoop. It makes use of both the Hadoop Distributed File System,

    HDFS, and Hadoop’s processing system, MapReduce.

     pig Latin for a language, Grunt for a shell, and Piggybank for a CPAN-like shared repository。

     What is pig used for ?

    ETL?

    research for raw data (unstructured)

    Pig Philosophy

    eat everything ;

    live anywhere;

    pig fly;

    domestic animal;(easy to write UDF)

    pig版 hello world:

    data:

    hello world, hello pig

    hello hadooop, hello hdfs

    I love programming

    I love this world

    I love programming with pig

     

    pig script:

    txt = load 'data.txt' as (line);

    words = foreach txt generate flatten(TOKENIZE(line)) as word;

    grpd = group words by word;

    describe grpd

    cntd = foreach grpd generate group, COUNT(words);

    dump cntd

    Looking for a job working at Home about MSBI
  • 相关阅读:
    IPC之——消息队列
    特殊命令
    面试概念集锦
    守护进程(精灵进程)
    IP SSL HTTPS
    钉钉监控样例
    中间人攻击
    iptables firewalld
    简单暴力高效率的OSM全球地图
    解决ubuntu使用命令sudo apt -get install 安装东西时出现"E: Sub-process /usr/bin/dpkg returned an error code (1) "的错误 问题描述:
  • 原文地址:https://www.cnblogs.com/huaxiaoyao/p/3465368.html
Copyright © 2011-2022 走看看