zoukankan      html  css  js  c++  java
  • 使用命令行编译打包运行MapReduce程序

    本文地址:http://www.cnblogs.com/myresearch/p/mapreduce-compile-jar-run.html,转载请注明源地址。

    对于如何编译WordCount.java,对于0.20 等旧版本版本的做法很常见,具体如下:

     javac -classpath /usr/local/hadoop/hadoop-1.0.1/hadoop-core-1.0.1.jar WordCount.java

    但较新的 2.X 版本中,已经没有 hadoop-core*.jar 这个文件,因此编辑和打包自己的MapReduce程序与旧版本有所不同。

    本文以 Hadoop 2.6环境下的WordCount实例来介绍 2.x 版本中如何编辑自己的MapReduce程序。

    Hadoop 2.x 版本中的依赖 jar

    Hadoop 2.x 版本中jar不再集中在一个 hadoop-core*.jar 中,而是分成多个 jar,如运行WordCount实例需要如下三个 jar:

    • $HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar

    • $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar

    • $HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar

    编译、打包 Hadoop MapReduce 程序

    将上述 jar 添加至 classpath 路径:

    hadoop@ubuntu:~$ export CLASSPATH="$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/lib/commons-cli-1.2.jar:$CLASSPATH"

    接着就可以编译 WordCount.java 了(使用的是 2.6.0源码中的 WordCount.java)

    文件位于/hadoop-2.6.0-src/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples 中,

    javac WordCount.java

    编译时会有警告,可以忽略。编译后可以看到生成了几个.class文件。

    /home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar(org/apache/hadoop/fs/Path.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found
    1 warning
    hadoop@ubuntu:~/opt/code$ ls
    WordCount.class WordCount.java WordCount$MapClass.class WordCount$Reduce.class

    接着把 .class 文件打包成 jar,才能在 Hadoop 中运行:

    hadoop@ubuntu:~/opt/code$ jar -cvf WordCount.jar ./WordCount*.class
    added manifest
    adding: WordCount.class(in = 3363) (out= 1687)(deflated 49%)
    adding: WordCount$MapClass.class(in = 1978) (out= 800)(deflated 59%)
    adding: WordCount$Reduce.class(in = 1641) (out= 645)(deflated 60%)

    创建HDFS所需的输入文件夹:

    hadoop@ubuntu:~/opt/code$ mkdir input
    hadoop@ubuntu:~/opt/code$ echo "Hello Hadoop Goodbye Hadoop" > ./input/file1
    hadoop@ubuntu:~/opt/code$ echo "Hello World Bye World" > ./input/file2
    hadoop@ubuntu:~/opt/code$ ls ./input
    file1 file2

    运行我们的wordcount程序:

    hadoop@ubuntu:~$ cd ~/opt/code

    hadoop@ubuntu:~/opt/code$ ~/opt/hadoop-2.6.0/bin/hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output

    程序运行完之后,检查我们的输出结果:

    hadoop@ubuntu:~/opt/code$ ls ./output
    part-r-00000  _SUCCESS
    hadoop@ubuntu:~/opt/code$ cat ./output/part-r-00000

    Bye 1
    Goodbye 1
    Hadoop 2
    Hello 2
    World 2

    PS:WordCount.java 源代码如下:

    package org.apache.hadoop.mapred;
    
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.Iterator;
    import java.util.List;
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.Mapper;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reducer;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;
    
    /**
     * This is an example Hadoop Map/Reduce application.
     * It reads the text input files, breaks each line into words
     * and counts them. The output is a locally sorted list of words and the 
     * count of how often they occurred.
     *
     * To run: bin/hadoop jar build/hadoop-examples.jar wordcount
     *            [-m <i>maps</i>] [-r <i>reduces</i>] <i>in-dir</i> <i>out-dir</i> 
     */
    public class WordCount extends Configured implements Tool {
      
      /**
       * Counts the words in each line.
       * For each line of input, break the line into words and emit them as
       * (<b>word</b>, <b>1</b>).
       */
      public static class MapClass extends MapReduceBase
        implements Mapper<LongWritable, Text, Text, IntWritable> {
        
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        
        public void map(LongWritable key, Text value, 
                        OutputCollector<Text, IntWritable> output, 
                        Reporter reporter) throws IOException {
          String line = value.toString();
          StringTokenizer itr = new StringTokenizer(line);
          while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            output.collect(word, one);
          }
        }
      }
      
      /**
       * A reducer class that just emits the sum of the input values.
       */
      public static class Reduce extends MapReduceBase
        implements Reducer<Text, IntWritable, Text, IntWritable> {
        
        public void reduce(Text key, Iterator<IntWritable> values,
                           OutputCollector<Text, IntWritable> output, 
                           Reporter reporter) throws IOException {
          int sum = 0;
          while (values.hasNext()) {
            sum += values.next().get();
          }
          output.collect(key, new IntWritable(sum));
        }
      }
      
      static int printUsage() {
        System.out.println("wordcount [-m <maps>] [-r <reduces>] <input> <output>");
        ToolRunner.printGenericCommandUsage(System.out);
        return -1;
      }
      
      /**
       * The main driver for word count map/reduce program.
       * Invoke this method to submit the map/reduce job.
       * @throws IOException When there is communication problems with the 
       *                     job tracker.
       */
      public int run(String[] args) throws Exception {
        JobConf conf = new JobConf(getConf(), WordCount.class);
        conf.setJobName("wordcount");
     
        // the keys are words (strings)
        conf.setOutputKeyClass(Text.class);
        // the values are counts (ints)
        conf.setOutputValueClass(IntWritable.class);
        
        conf.setMapperClass(MapClass.class);        
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);
        
        List<String> other_args = new ArrayList<String>();
        for(int i=0; i < args.length; ++i) {
          try {
            if ("-m".equals(args[i])) {
              conf.setNumMapTasks(Integer.parseInt(args[++i]));
            } else if ("-r".equals(args[i])) {
              conf.setNumReduceTasks(Integer.parseInt(args[++i]));
            } else {
              other_args.add(args[i]);
            }
          } catch (NumberFormatException except) {
            System.out.println("ERROR: Integer expected instead of " + args[i]);
            return printUsage();
          } catch (ArrayIndexOutOfBoundsException except) {
            System.out.println("ERROR: Required parameter missing from " +
                               args[i-1]);
            return printUsage();
          }
        }
        // Make sure there are exactly 2 parameters left.
        if (other_args.size() != 2) {
          System.out.println("ERROR: Wrong number of parameters: " +
                             other_args.size() + " instead of 2.");
          return printUsage();
        }
        FileInputFormat.setInputPaths(conf, other_args.get(0));
        FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));
            
        JobClient.runJob(conf);
        return 0;
      }
      
      
      public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new WordCount(), args);
        System.exit(res);
      }
    
    }

    参考资料

    http://www.powerxing.com/hadoop-build-project-by-shell/

    http://blog.sina.com.cn/s/blog_68cceb610101r6tg.html

    http://www.cppblog.com/humanchao/archive/2014/05/27/207118.aspx

  • 相关阅读:
    Anaconda+Tensorflow环境安装与配置(转载)
    Win10 Anaconda下TensorFlow-GPU环境搭建详细教程(包含CUDA+cuDNN安装过程)(转载)
    设计模式之工厂方法模式VS简单工厂方法模式
    WPF之外观模式
    WPF之小动画三
    WPF之小动画二
    WPF之小动画一
    WPF之Behavior
    WPF之拖动项滚动条自滚动(当拖动项到达高度的边界时候滚动条自己可以上下滚动)
    WPF之给使用了模板的MenuItem添加快捷操作
  • 原文地址:https://www.cnblogs.com/myresearch/p/mapreduce-compile-jar-run.html
Copyright © 2011-2022 走看看