zoukankan      html  css  js  c++  java
  • Hadoop中OutputFormat解析

    一、OutputFormat

    OutputFormat描述的是MapReduce的输出格式,它主要的任务是:

      1.验证job输出格式的有效性,如:检查输出的目录是否存在。

      2.通过实现RecordWriter,将输出的结果写到文件系统的文件中。

    OutputFormat的主要是由三个抽象方法组成,下面根据源代码介绍每个方法的功能,源代码详解如下:

     1 public abstract class OutputFormat<K, V> {
     2 
     3   /** 
     4    * Get the {@link RecordWriter} for the given task. 
     5    *  得到给定任务的K-V对,即RecordWriter。
     6    * @param context the information about the current task.
     7    * @return a {@link RecordWriter} to write the output for the job.
     8    * @throws IOException
     9    */
    10   public abstract RecordWriter<K, V> getRecordWriter(TaskAttemptContext context) 
    11           throws IOException, InterruptedException;
    12 
    13   /** 
    14    * Check for validity of the output-specification for the job.
    15    * 为job检查输出格式的有效性。
    16    * <p>This is to validate the output specification for the job when it is
    17    * a job is submitted.  Typically checks that it does not already exist,
    18    * throwing an exception when it already exists, so that output is not
    19    * overwritten.</p>
    20    * 这里,当job被提交时验证输出格式。实际上检查输出目录是否已经存在,当存在时抛出exception。
    21    * 以至于原来的输出不会被覆盖。
    22    * @param context information about the job
    23    * @throws IOException when output should not be attempted
    24    */
    25   public abstract void checkOutputSpecs(JobContext context) throws IOException, InterruptedException;
    26 
    27   /**
    28    * Get the output committer for this output format. This is responsible
    29    * for ensuring the output is committed correctly.
    30    * 获得一个OutPutCommitter对象。这是用来确保输出被正确的提交。
    31    * @param context the task context
    32    * @return an output committer
    33    * @throws IOException
    34    * @throws InterruptedException
    35    */
    36   public abstract OutputCommitter getOutputCommitter(TaskAttemptContext context)
    37           throws IOException, InterruptedException;
    38 }
  • 相关阅读:
    use imagination
    tar
    简单抓取安居客房产数据,并保存到Oracle数据库
    svn的安装(整合apache、ldap)包括错误解决post commit FS processing had error
    SVN安装中遇到的问题
    Linux环境源码编译安装SVN
    [转]SVN安装问题The Apache Portable Runtime (APR) library cannot be found
    深入浅出数据分析-脑图
    Python3.5在Windows 7下连接ORACLE数据库
    Python3.5之TuShare
  • 原文地址:https://www.cnblogs.com/rolly-yan/p/3704060.html
Copyright © 2011-2022 走看看