zoukankan      html  css  js  c++  java
  • MapReduce中的InputFormat

    InputFormat在hadoop源码中是一个抽象类 public abstract class InputFormat<K, V>

    https://github.com/apache/hadoop/blob/master/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/InputFormat.java
    

    可以参考文章

    https://cloud.tencent.com/developer/article/1043622
    

    其中有两个抽象方法

      public abstract 
        List<InputSplit> getSplits(JobContext context
                                   ) throws IOException, InterruptedException;
    

      public abstract 
        RecordReader<K,V> createRecordReader(InputSplit split,
                                             TaskAttemptContext context
                                            ) throws IOException, 
                                                     InterruptedException;
    

     getSplits方法负责将输入的文件做一个逻辑上的切分,切分成一个List<InputSplit>,InputSplit的源码在

    https://github.com/apache/hadoop/blob/master/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/InputSplit.java
    

     在下文中提到 InputSplit是一个逻辑概念,并没有对实际文件进行切分,它只包含一些元数据信息,比如数据的起始位置,数据长度,数据所在的节点等

    https://cloud.tencent.com/developer/article/1481777
    
  • 相关阅读:
    bloom(转)
    关于模态对话框某些情况下在opengl下需要按alt才能显示的问题
    程序员的特征
    ffmpeg cross compile
    Tone mapping
    osgXI
    dx11 about post process...
    asp.net网站异常处理方式
    把datatable导出到指定的excel中
    Visual Studio 2008功能提升
  • 原文地址:https://www.cnblogs.com/tonglin0325/p/13750952.html
Copyright © 2011-2022 走看看