zoukankan html css js c++ java

解读：MR多路径输入

对于在一个MR-Job中使用多路径作为输入文件，一般有三种方法：

1).多次调用，加载不同路径：

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

//输入路径in01 in02
String in01 = "hdfs://RS5-112:9000/cs01/path01";
String in02 = "hdfs://RS5-112:9000/cs02/path02";

//多次调用addInputPath()方法
FileInputFormat.addInputPath(job,new Path(in0));
FileInputFormat.addInputPath(job,new Path(in1));

2).一次调用，同时加载多路径(字符串用逗号隔开)：

//这种方式的第二个参数只能是:将路径以逗号拼接的字符串
FileInputFormat.addInputPaths(job,
    "hdfs://RS5-112:9000/cs01/path1,hdfs://RS5-112:9000/cs02/path2");

3).使用MultipleInputs类的方法

addInputPath(Job job, Path path,
      Class<? extends InputFormat> inputFormatClass);

addInputPath(Job job, Path path,
  Class<? extends InputFormat> inputFormatClass,
  Class<? extends Mapper> mapperClass);

MultipleInputs类的强大之处在于不仅可以多次调用addInputPath()方法加载路径，而且可以根据路径的不同指定不同的输入格式，更有甚者还可以根据输入格式的不同指定不同的Mapper函数进行处理。详见 MR案例：倒排索引 && MultipleInputs 和 MR案例：CombineFileInputFormat

DEMO1:

MultipleInputs.addInputPath(job, 
        new Path("hdfs://RS5-112:9000/cs01/path01"), 
        TextInputFormat.class);
MultipleInputs.addInputPath(job, 
        new Path("hdfs://RS5-112:9000/cs02/path2"), 
        KeyValueInputFormat.class);

DEMO2:

MultipleInputs.addInputPath(job, 
        new Path("hdfs://RS5-112:9000/cs01/path01"), 
        TextInputFormat.class,
        Mapper01.class);
MultipleInputs.addInputPath(job, 
        new Path("hdfs://RS5-112:9000/cs02/path2"), 
        KeyValueInputFormat.class,
        Mapper02.class);

查看全文

相关阅读:
SQL“多字段模糊匹配关键字查询”[转载] Virus
[转载]分页存储过程 Virus
质因数 Virus
由传值引发的思考 Virus
RFID票务系统调研报告 Virus
以人为中心还是以事为中心 Virus
IOC容器 Virus
[导入]数据库设计三大范式应用实例剖析 Virus
电子商务B2B调研报告 Virus
心情不是太好 Virus

原文地址：https://www.cnblogs.com/skyl/p/4753703.html