zoukankan      html  css  js  c++  java
  • Hadoop Mapreduce之WordCount实现

    1.新建一个WCMapper继承Mapper
    public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
          @Override
          protected void map(LongWritable key, Text value, Context context)
                      throws IOException, InterruptedException {
                //接收数据V1
                String line = value.toString();
                //切分数据
                String[] wordsStrings = line.split(" ");
                //循环
                for (String w: wordsStrings) {
                      //出现一次,记一个一,输出
                      context.write(new Text(w), new LongWritable(1));
                }
          }
    }
     
    2.新建一个WCReducer继承Reducer
    public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
          @Override
          protected void reduce(Text key, Iterable<LongWritable> v2s, Context context)
                      throws IOException, InterruptedException {
                // TODO Auto-generated method stub
                //接收数据
                //Text k3 = k2;
                //定义一个计算器
                long counter = 0;
                //循环v2s
                for (LongWritable i : v2s)
                {
                      counter += i.get();
                }
                //输出
                context.write(key, new LongWritable(counter));
          }
    }
    3.WordCount类实现Main方法
    /*
     * 1.分析具体的业力逻辑,确定输入输出数据样式
     * 2.自定义一个类,这个类要继承import org.apache.hadoop.mapreduce.Mapper;
     * 重写map方法,实现具体业务逻辑,将新的kv输出
     * 3.自定义一个类,这个类要继承import org.apache.hadoop.mapreduce.Reducer;
     * 重写reduce,实现具体业务逻辑
     * 4.将自定义的mapper和reducer通过job对象组装起来
     */
    public class WordCount {
          public static void main(String[] args) throws Exception {
                // 构建Job对象
                Job job = Job.getInstance(new Configuration());
                
                // 注意:main方法所在的类
                job.setJarByClass(WordCount.class);
                
                // 设置Mapper相关属性
                job.setMapperClass(WCMapper.class);
                job.setMapOutputKeyClass(Text.class);
                job.setMapOutputValueClass(LongWritable.class);
                FileInputFormat.setInputPaths(job, new Path("/words.txt"));
                
                // 设置Reducer相关属性
                job.setReducerClass(WCReducer.class);
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(LongWritable.class);
                FileOutputFormat.setOutputPath(job, new Path("/wcount619"));
                // 提交任务
                job.waitForCompletion(true);
          }     
    }
     
     
    4.打包为wc.jar,并上传到linux,并在Hadoop下运行
         hadoop jar /root/wc.jar
  • 相关阅读:
    转:win2000/2003 Discuz生存环境搭建及基础优化
    http://db.grussell.org 测试答案 1
    转:linux系统的主机做代理服务器
    C# code 0002
    Welcome to .NET BY C#
    几篇关于Visual Studio Team Foundation Server (TFS) 安装的文章
    VS2003不知道怎么不能编译C文件
    Kaspersky 更新修复
    入行性能测试两个月
    WMI 注册表 StdRegProv
  • 原文地址:https://www.cnblogs.com/dulixiaoqiao/p/6985237.html
Copyright © 2011-2022 走看看