zoukankan      html  css  js  c++  java
  • MR案例:CombineFileInputFormat

    CombineFileInputFormat是一个抽象类。Hadoop提供了两个实现类CombineTextInputFormatCombineSequenceFileInputFormat

    此案例让我明白了三点:详见 解读:MR多路径输入解读:CombineFileInputFormat类

    • 对于单一输入路径情况:
    //指定输入格式CombineFileInputFormat
    job.setInputFormatClass(CombineTextInputFormat.class); 
    
    //指定SplitSize
    CombineTextInputFormat.setMaxInputSplitSize(job, 60*1024*1024L);
    
    //指定输入路径
    CombineTextInputFormat.addInputPath(job, new Path(args[0]));
    • 对于多路径输入情况①:
    //指定输入格式CombineFileInputFormat
    job.setInputFormatClass(CombineTextInputFormat.class); 
    
    //指定SplitSize
    CombineTextInputFormat.setMaxInputSplitSize(job, 60*1024*1024L);
    
    //指定输入路径(两个)
    CombineTextInputFormat.addInputPath(job, new Path(args[0]));
    CombineTextInputFormat.addInputPath(job, new Path(args[1]));
    • 多路径输入情况②:
    //指定SplitSize
    CombineTextInputFormat.setMaxInputSplitSize(job, 60*1024*1024L);
    
    //指定输入路径,以及指定输入格式
    MultipleInputs.addInputPath(job, new Path(args[0]), CombineTextInputFormat.class);
    MultipleInputs.addInputPath(job, new Path(args[1]), CombineTextInputFormat.class);

    细心观察,还会发现两种多路径输入① ②的区别:(已验证)

    1. 第一种方案:先把所有的输入集中起来求出总的输入大小,再除以SplitSize算出总的map个数
    2. 第二种方案:先分别算出每个MultipleInputs路径对应的map个数,再对两个MultipleInputs的map个数求和

    完整的代码:

    package test0820;
    
    import java.io.IOException;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.VLongWritable;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.CombineTextInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class WordCount0826 {
    
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf);
            job.setJarByClass(WordCount0826.class);      
    
            job.setMapperClass(IIMapper.class);
            job.setReducerClass(IIReducer.class);
            job.setNumReduceTasks(5);
    
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(VLongWritable.class);
    
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(VLongWritable.class);
    
            //CombineFileInputFormat类
            //job.setInputFormatClass(CombineTextInputFormat.class); 
            CombineTextInputFormat.setMaxInputSplitSize(job, 60*1024*1024L);
    
    //CombineTextInputFormat.addInputPath(job, new Path(args[0])); //CombineTextInputFormat.addInputPath(job, new Path(args[1])); MultipleInputs.addInputPath(job, new Path(args[0]), CombineTextInputFormat.class); MultipleInputs.addInputPath(job, new Path(args[1]), CombineTextInputFormat.class);
    FileOutputFormat.setOutputPath(job,
    new Path(args[2])); System.exit(job.waitForCompletion(true)? 0:1); } //map public static class IIMapper extends Mapper<LongWritable, Text, Text, VLongWritable>{ @Override protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { String[] splited = value.toString().split(" "); for(String word : splited){ context.write(new Text(word),new VLongWritable(1L)); } } } //reduce public static class IIReducer extends Reducer<Text, VLongWritable, Text, VLongWritable>{ @Override protected void reduce(Text key, Iterable<VLongWritable> v2s, Context context) throws IOException, InterruptedException { long sum=0; for(VLongWritable vl : v2s){ sum += vl.get(); } context.write(key, new VLongWritable(sum)); } } }
  • 相关阅读:
    64位系统上32位进程拷贝文件到System32目录时的重定向
    mac osx上为qt应用生成debug symbol
    c++正则表达式模板库GRETA的使用
    win驱动安装记录
    QWidget标题栏双击事件
    mac上的应用提权
    js初级DOM&BOM知识点总结
    js数组遍历
    angular.js小知识总结
    Node.js
  • 原文地址:https://www.cnblogs.com/skyl/p/4761662.html
Copyright © 2011-2022 走看看