zoukankan      html  css  js  c++  java
  • MultipleInputs FileSplit cannot be cast to TaggedInputSplit ClassCastException hadoop多路径格式输入异常

    1.问题描述

    使用hadoop 的reduce端排序,用MultipleInputs 输入两个文件夹下不同格式的文件,使用两个mapper解析,hadoop版本2.8.3. 3 hadoop 3.2.1也报同样的错误。

    java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
    Caused by: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit
    at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.setup(DelegatingMapper.java:45)
    at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run(DelegatingMapper.java:54)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    2.问题分析

    网上找了很多资料都说是hadoop的一个bug,需要修改TaggedInputSplit的源码(路径D:hadoophadoop-2.8.3sharehadoopmapreducehadoop-mapreduce-client-core-2.8.3.jar!orgapachehadoopmapreducelibinputDelegatingMapper.class),也就是hadoop-client-core,然后重新编译源码,生成hadoop-mapreduce-client-core-2.8.3.jar替换掉本地仓库的jar包就可以了。

    (1)MultipleInputs的addInputPath方法如下,job.setMapperClass(DelegatingMapper.class);表明用DelegatingMapper的作为输入文件的处理类,实际的处理数据的Mapper是入参mapperClass, mapperClass名称和输入文件路径被组织成了字符串,然后设置到conf配置中。可能最后被DelegatingMapper读取配置conf,然后在获取到实际的处理类mapperClass和输入文件路径,进行处理文件中的数据。

     public static void addInputPath(Job job, Path path, Class<? extends InputFormat> inputFormatClass, Class<? extends Mapper> mapperClass) {
            addInputPath(job, path, inputFormatClass);
            Configuration conf = job.getConfiguration();
            String mapperMapping = path.toString() + ";" + mapperClass.getName();
            String mappers = conf.get("mapreduce.input.multipleinputs.dir.mappers");
            conf.set("mapreduce.input.multipleinputs.dir.mappers", mappers == null ? mapperMapping : mappers + "," + mapperMapping);
            job.setMapperClass(DelegatingMapper.class);
        }

    (2)在idea中点击报错的行at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.setup(DelegatingMapper.java:45) 会跳转到转换异常发生的地方

    protected void setup(Mapper<K1, V1, K2, V2>.Context context) throws IOException, InterruptedException {
    TaggedInputSplit inputSplit = (TaggedInputSplit)context.getInputSplit();
    this.mapper = (Mapper)ReflectionUtils.newInstance(inputSplit.getMapperClass(), context.getConfiguration());
    }
    context.getInputSplit();返回的是FileSplit对象类型,两个类的定义如下,两个类都是继承于InputSplit,只有父类和子类之间能转换,但是这两个类之间无法完成强制转换,所以报异常
    FileSplit类定义
    public class FileSplit extends InputSplit implements Writable 
    TaggedInputSplit 类定义
    class TaggedInputSplit extends InputSplit implements Configurable, Writable {

    3.解决方法
    实际异常原因是我在驱动代码中加入了下面三行代码,用MultipleInputs添加了输入路径之后,和输入input之后,就不能再用
    FileInputFormat添加输入路径了,也不用setInputFormatClass,
    这样系统会认为是普通的输入FileSpilt,所以
    TaggedInputSplit inputSplit = (TaggedInputSplit)context.getInputSplit();中context.getInputSplit()返回的是FileSplit类型,
    不是
    TaggedInputSplit而无法被强制转换为TaggedInputSplit 类,TaggedInputSplit类会根据是要解析出传入的实际map处理类的。
     
    public int run(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    if (args.length!=3)
    {
    return -1;
    }
    Job job=new Job(getConf(),"joinStationTemperatueRecord");
    if (job==null)
    {
    return -1;
    }
    String strClassName=this.getClass().getName();
    job.setJarByClass(this.getClass());
    //设置两个输入路径,一个输出路径
    Path StationPath=new Path(args[0]);
    Path TemperatureRecordPath= new Path(args[1]);
    Path outputPath=new Path(args[2]);
    MultipleInputs.addInputPath(job,StationPath, TextInputFormat.class,StationMapper.class);
    MultipleInputs.addInputPath(job,TemperatureRecordPath,TextInputFormat.class,TemperatureRecordMapper.class);
    FileOutputFormat.setOutputPath(job,outputPath);

    //设置分区类、分组类、reduce类
    job.setPartitionerClass(FirstPartitioner.class);
    job.setGroupingComparatorClass(GroupingComparator.class);
    job.setReducerClass(JoinReducer.class);
    job.setNumReduceTasks(2);
    //下面的三行不能加,否则会报java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit
    // job.setInputFormatClass(TextInputFormat.class);

    // FileInputFormat.addInputPath(job,StationPath);
    // FileInputFormat.addInputPath(job,TemperatureRecordPath);
    //设置输出类型
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    job.setMapOutputKeyClass(TextPair.class);
    job.setMapOutputValueClass(Text.class);
    //删除结果目录,重新生成
    FileUtil.fullyDelete(new File(args[2]));
    return job.waitForCompletion(true)? 0:1;
    }
     

    自己开发了一个股票智能分析软件,功能很强大,需要的点击下面的链接获取:

    https://www.cnblogs.com/bclshuai/p/11380657.html



     
  • 相关阅读:
    第三周作业
    第二周作业
    第一次作业(2)
    第一次作业
    百度翻译新API C#版在 winform,Asp.Net的小程序
    ajax 里的数据请求
    结合css与javascript来实现手机移动端的屏幕滑动效果
    js公农历互转(1900~2100年)
    webpack命令
    vscode快速输出console.log
  • 原文地址:https://www.cnblogs.com/bclshuai/p/12343092.html
Copyright © 2011-2022 走看看