zoukankan      html  css  js  c++  java
  • hadoop输出控制,输出到指定文件中

    最近在研究将hadoop输出内容放到指定的文件夹中,

    (未完待续)

    以wordcount内容为例子:

    public class wordcount {
        public static class TokenizerMapper extends
                Mapper<Object, Text, Text, IntWritable>
        {
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();

            public void map(Object key, Text value, Context context)
                    throws IOException, InterruptedException {
                StringTokenizer itr = new StringTokenizer(value.toString());
                while (itr.hasMoreTokens()) {
                    word.set(itr.nextToken());
                    context.write(word, one);
                }
            }
        }
       
       
        public static class IntSumReducer extends
                Reducer<Text, IntWritable, Text, IntWritable> {
            private IntWritable result = new IntWritable();
           
            private MultipleOutputs<Text, IntWritable> mo;
           
            public void reduce(Text key, Iterable<IntWritable> values,
                    Context context) throws IOException, InterruptedException {
               
                int sum = 0;
                for (IntWritable val : values) {
                    sum += val.get();
                }
                result.set(sum);
                context.write(key, result);
               

                mo = new MultipleOutputs<Text, IntWritable>(context);//context和MultipleOutputs是独立的,都进行了写功能,互不干扰
                //MultipleOutputs的write写到多个文件,但是文件之间不能覆盖
                Text kw= new Text("this a test!sum is:");
                IntWritable content= new IntWritable(sum);
                
                mo.write(kw, content, key.toString());//success,输出内容到输出目录out下的key.toString()文件中去。其内容全部分开,wordcount自身的context输出文件中包含全部内容,而MultipleOutputs在这里将他们分开写到不同的文件里面去。
                //mo.write(key, result, "error"+key.toString());//success
                //mo.write(key, result, "all");//testall.jar 有问题,因为all-r-00000生成一次后,不能覆盖
               
                //mo.write(key, result, null);//wrong!no file to write
                //mo.write(key, result, "/user/test");//unsuccess
                //mo.write(null, key, result, key.toString());

                //mo.write(key, result, "all");//unsuccess
                //mo.write(key.toString(), key, result);//unsuccess
                mo.close();
            }
        }
       
        public static void main(String[] args) throws Exception {
         Configuration conf = new Configuration();
       
         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
         if (otherArgs.length != 2) {
         System.err.println("Usage: wordcount <in> <out>");
         System.exit(2);
         }
        
        
         Job job = new Job(conf, "word count");
         job.setJarByClass(wordcount.class);
         job.setMapperClass(TokenizerMapper.class);
         job.setCombinerClass(IntSumReducer.class);
         job.setReducerClass(IntSumReducer.class);
         job.setOutputKeyClass(Text.class);
         job.setOutputValueClass(IntWritable.class);
    //     job.setOutputFormatClass(testOutputFormat.class)
        
         FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
         FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        
         System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }

  • 相关阅读:
    数据结构学习笔记(特殊的线性表:栈与队列)
    数据结构学习笔记(线性表)
    使用U盘安装 OS X 的坑
    chrome插件推荐
    Mac下安装oh-my-zsh
    sublime下让代码居中
    Mac上关于shell使用Python3和C++11声明
    github学习(三)
    github学习(二)
    github学习(一)
  • 原文地址:https://www.cnblogs.com/cl1024cl/p/6205688.html
Copyright © 2011-2022 走看看