zoukankan      html  css  js  c++  java
  • Hadoop中OutputFormat解析

    一、OutputFormat

    OutputFormat描述的是MapReduce的输出格式,它主要的任务是:

      1.验证job输出格式的有效性,如:检查输出的目录是否存在。

      2.通过实现RecordWriter,将输出的结果写到文件系统的文件中。

    OutputFormat的主要是由三个抽象方法组成,下面根据源代码介绍每个方法的功能,源代码详解如下:

     1 public abstract class OutputFormat<K, V> {
     2 
     3   /** 
     4    * Get the {@link RecordWriter} for the given task. 
     5    *  得到给定任务的K-V对,即RecordWriter。
     6    * @param context the information about the current task.
     7    * @return a {@link RecordWriter} to write the output for the job.
     8    * @throws IOException
     9    */
    10   public abstract RecordWriter<K, V> getRecordWriter(TaskAttemptContext context) 
    11           throws IOException, InterruptedException;
    12 
    13   /** 
    14    * Check for validity of the output-specification for the job.
    15    * 为job检查输出格式的有效性。
    16    * <p>This is to validate the output specification for the job when it is
    17    * a job is submitted.  Typically checks that it does not already exist,
    18    * throwing an exception when it already exists, so that output is not
    19    * overwritten.</p>
    20    * 这里,当job被提交时验证输出格式。实际上检查输出目录是否已经存在,当存在时抛出exception。
    21    * 以至于原来的输出不会被覆盖。
    22    * @param context information about the job
    23    * @throws IOException when output should not be attempted
    24    */
    25   public abstract void checkOutputSpecs(JobContext context) throws IOException, InterruptedException;
    26 
    27   /**
    28    * Get the output committer for this output format. This is responsible
    29    * for ensuring the output is committed correctly.
    30    * 获得一个OutPutCommitter对象。这是用来确保输出被正确的提交。
    31    * @param context the task context
    32    * @return an output committer
    33    * @throws IOException
    34    * @throws InterruptedException
    35    */
    36   public abstract OutputCommitter getOutputCommitter(TaskAttemptContext context)
    37           throws IOException, InterruptedException;
    38 }
  • 相关阅读:
    提高效率
    kill 挂起 Apache Web Server
    /var/spool/mail/root
    https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/connections.py
    top swap
    top load average
    Difference between exit() and sys.exit() in Python
    八进制权限掩码 3位 4位 setuid setgid sticky
    以二进制和八进制方式表示文件模式
    0 lrwxrwxrwx. 1 root root 13 Nov 20 12:44 scala -> scala-2.12.4
  • 原文地址:https://www.cnblogs.com/rolly-yan/p/3704060.html
Copyright © 2011-2022 走看看