zoukankan      html  css  js  c++  java
  • 【转载】Java嵌入Pig编程

    转自:https://wiki.apache.org/pig/EmbeddedPig

    Embedding Pig In Java Programs

    Sometimes you want more control than Pig scripts can give you. If so, you can embed Pig Latin in Java (just like SQL can be embedded in programs using JDBC).

    The following steps need to be carried out:

    • Make sure pig.jar is on your classpath.

    • Create an instance of PigServer. See Javadoc for more details.

    • Issue commands through that PigServer by calling PigServer.registerQuery().

    • To retrieve results, either call PigServer.openIterator() or PigServer.store().

    • If you have user defined functions, register them by calling PigServer.registerJar().

     

    Example

    Let's assume you need to count the number of occurrences of each word in a document. Let's also assume that you have EvalFunction Tokenize that parses a line of text and returns all the words for that line. The function is located in /mylocation/tokenize.jar.

    The PigLatin script for the computation will look like this:

     

    register /mylocation/tokenize.jar
    A = load 'mytext' using TextLoader();
    B = foreach A generate flatten(tokenize($0));
    C = group B by $1;
    D = foreach C generate flatten(group), COUNT(B.$0);
    store D into 'myoutput';

    The same computation can be performed with this Java program:

     

    import java.io.IOException;
    import org.apache.pig.PigServer;
    
    public class WordCount {
       public static void main(String[] args) {
          
          PigServer pigServer = new PigServer();
            
          try {
             pigServer.registerJar("/mylocation/tokenize.jar");
             runMyQuery(pigServer, "myinput.txt";
            } 
          catch (IOException e) {
             e.printStackTrace();
            }
       }
       
       public static void runMyQuery(PigServer pigServer, String inputFile) throws IOException {        
           pigServer.registerQuery("A = load '" + inputFile + "' using TextLoader();");
           pigServer.registerQuery("B = foreach A generate flatten(tokenize($0));");
           pigServer.registerQuery("C = group B by $1;");
           pigServer.registerQuery("D = foreach C generate flatten(group), COUNT(B.$0);");
          
           pigServer.store("D", "myoutput");
       }
    }

    Notes:

    • The jar which contains your functions must be registered.
    • The four calls to pigServer.registerQuery() simply cause the query to be parsed and enquired. The query is not actually executed until pigServer.store() is called.

    • The input data referred to on the load statement, must be on HDFS in the specified location.
    • The final result is placed into myoutput file in the your current working directory on HDFS. (By default this is your home directory on HDFS.)

    To run your program, you need to first compile it by using the following command:

     

    javac -cp <path>pig.jar WordCount.java

    If the compilation is successful, you can then run your program:

     

    java -cp <path>pig.jar WordCount
  • 相关阅读:
    压力测试工具集合(ab,webbench,Siege,http_load,Web Application Stress)
    微软压力测试工具 web application stress
    linux下的3种DDOS软件介绍
    windows 配置squid反向代理服务器
    windows下简单配置squid反向代理服务器
    [分享]windows下编译squid的经验(转)
    在CentOS 5下安装中文五笔
    CentOS LVS安装配置
    CentOS4.5下LVS方案
    linux LVS (keepalived+ipvsadm)负载均衡搭建
  • 原文地址:https://www.cnblogs.com/YangtzeYu/p/6277259.html
Copyright © 2011-2022 走看看