zoukankan      html  css  js  c++  java
  • Hadoop worldcount

    以前的公司和现在的公司,都用到了hadoop和hdfs。一直没入门,今天照着官网写了一个hadoop worldcount demo

    1. hadoop是一个框架,什么是框架,spring是一个框架、mybatis是一个框架,框架是把系统中通用的功能写进去,减少开发工作量。比如基于spring boot开发一个web应用,直接写一个java类,加一些注解,打成jar包,java -jar demo.java即完成应用开发。

      spring boot也是基于java serlet、tomcat、jetty等封装的一个框架,有了这个框架,我们就不用再写servlet实现类,配置web.xml等重复工作

    2. hadoop需要的数据存放在hdfs里面,这里参照官网,在本机运行了一个伪分布式的hdfs

    3. demo组成,写worldcount类,打成jar包,放到本机hadoop运行,从hdfs读文件内容,把结果写到hdfs中

    4. 注意参考官网

      mapreduce官网: http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0

      hdfs官网:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation

    pom.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>com.gxf</groupId>
        <artifactId>hadoop_demo</artifactId>
        <version>1.0-SNAPSHOT</version>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-core</artifactId>
                <version>1.2.1</version>
            </dependency>
        </dependencies>
        
    </project>

    WordCount.java这个直接从官网copy过来的

    import java.io.IOException;
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class WordCount {
    
      public static class TokenizerMapper
          extends Mapper<Object, Text, Text, IntWritable>{
    
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
    
        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
          StringTokenizer itr = new StringTokenizer(value.toString());
          while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
          }
        }
      }
    
      public static class IntSumReducer
          extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();
    
        public void reduce(Text key, Iterable<IntWritable> values,
            Context context
        ) throws IOException, InterruptedException {
          int sum = 0;
          for (IntWritable val : values) {
            sum += val.get();
          }
          result.set(sum);
          context.write(key, result);
        }
      }
    
      public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
    }

    这里没有加package,因为我搞不定,所以去掉了包名

    接着就是打成jar包、准备文本文件放到hdfs、使用hadoop运行jar、查看结果。这些步骤在官网上有

  • 相关阅读:
    学习:类和对象——构造函数和析构函数
    学习:类和对象——封装
    学习:引用类型
    学习:内存分区模型
    实现:通讯录管理系统
    实现:结构体案例
    学习:结构体
    实现:指针和冒泡函数和数组
    学习:指针
    学习:函数的分文件编写
  • 原文地址:https://www.cnblogs.com/luckygxf/p/10054379.html
Copyright © 2011-2022 走看看