zoukankan      html  css  js  c++  java
  • Ubuntu中使用终端运行Hadoop程序

    接上一篇《Ubuntu Kylin系统下安装Hadoop2.6.0》

    通过上一篇,Hadoop伪分布式基本配好了。

    下一步是运行一个MapReduce程序,以WordCount为例:

    1. 构建实现类:

    cd /usr/local/hadoop
    mkdir workspace
    cd workspace
    gedit WordCount.java

    将代码复制粘贴。

    import java.io.IOException;
    import java.util.StringTokenizer;
     
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
     
    public class WordCount {
     
      public static class TokenizerMapper
           extends Mapper<Object, Text, Text, IntWritable>{
     
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
     
        public void map(Object key, Text value, Context context
                        ) throws IOException, InterruptedException {
          StringTokenizer itr = new StringTokenizer(value.toString());
          while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
          }
        }
      }
     
      public static class IntSumReducer
           extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();
     
        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
                           ) throws IOException, InterruptedException {
          int sum = 0;
          for (IntWritable val : values) {
            sum += val.get();
          }
          result.set(sum);
          context.write(key, result);
        }
      }
     
      public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
    }

    对于代码的具体分析,下一篇再详细讲解。

    2. 编译

    (1) 添加JAVA_HOME

      export JAVA_HOME=/usr/lib/jvm/java-8u5-sun
    

      忘记JAVA_HOME的可以使用:

      echo $JAVA_HOME
    

    (2) 将jdk目录下的bin文件夹添加到环境变量

    export PATH=$JAVA_HOME/bin:$PATH
    

    (3) 将hadoop_classpath添加到环境变量

    export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
    

    编译WordCount.java文件

    ../bin/hadoop com.sun.tools.javac.Main WordCount.java
    

      其中com.sun.tools.javac.Main是生成一个编译器的实例

      上述语句生成三个class: WordCount.class  Reducer.class  TokenizerMapper.class

    将上述三个class打包成.jar包

    jar cf WordCount.jar WordCount*.class
    

    生成WordCount.jar

    3. 运行

    bin/hdfs dfs -mkdir /user
    bin/hdfs dfs -mkdir /user/hadoop
    

      构造输入文件:

    bin/hdfs dfs -put etc/hadoop /input
    

      其中,etc/hadoop是输入文件,可替换为其他文件

    bin/hadoop jar /usr/local/hadoop/workspace/WordCount.jar /input /output 
    

      查看运行结果

    bin/hdfs dfs -cat /output/*

    4. 结束Hadoop

    sbin/stop-dfs.sh
    

      

  • 相关阅读:
    jekyll简单使用
    三、ansible简要使用
    四、ansible主机组定义
    项目中远程连接404 NOT FOUND问题的原因以及解决办法(这里只涉及我遇到的问题)
    AS3中的位操作
    AS3中is和as操作符的区别
    static 函数和普通函数的区别
    [译] SystemTap
    2017-09-17 python 学习笔记
    xargs 命令使用小记
  • 原文地址:https://www.cnblogs.com/kingatnuaa/p/4174706.html
Copyright © 2011-2022 走看看