zoukankan      html  css  js  c++  java
  • sparksql读取hive数据报错:java.lang.RuntimeException: serious problem

    问题:

    Caused by: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index: 0
    	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1016)
    	... 65 more
    Caused by: java.lang.IndexOutOfBoundsException: Index: 0
    	at java.util.Collections$EmptyList.get(Collections.java:4454)
    	at org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
    	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
    	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:927)
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:836)
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:702)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:748)
    java.lang.RuntimeException: serious problem
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
    	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
    	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    	at scala.Option.getOrElse(Option.scala:121)
    	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    	at scala.Option.getOrElse(Option.scala:121)
    
    

    原因:

    sparksql生成的hive表有空文件,但是sparksql读取空文件的时候,因为表示orc格式的,导致sparksql解析orc文件出错。但是用hive却可以正常读取。
    

    解决办法:

    设置set spark.sql.hive.convertMetastoreOrc=true
    单纯的设置以上参数还是会报错:

    java.lang.IndexOutOfBoundsException
    	at java.nio.Buffer.checkIndex(Buffer.java:540)
    	at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
    	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:374)
    	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:316)
    	at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:187)
    	at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:75)
    	at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:73)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    	at scala.collection.TraversableOnce$class.collectFirst(TraversableOnce.scala:145)
    	at scala.collection.AbstractIterator.collectFirst(Iterator.scala:1336)
    	at org.apache.spark.sql.hive.orc.OrcFileOperator$.getFileReader(OrcFileOperator.scala:86)
    	at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$readSchema$1.apply(OrcFileOperator.scala:95)
    	at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$readSchema$1.apply(OrcFileOperator.scala:95)
    	at scala.collection.immutable.Stream.flatMap(Stream.scala:489)
    

    需要再设置set spark.sql.orc.impl=native
    参考https://issues.apache.org/jira/browse/SPARK-19809

  • 相关阅读:
    HTML DOM 06 节点关系
    HTML DOM 05 事件(三)
    HTML DOM 05 事件(二)
    HTML DOM 05 事件(一)
    html DOM 04 样式
    html DOM 03 节点的属性
    html DOM 02 获取节点
    html DOM 01 节点概念
    JavaScript 29 计时器
    JavaScript 28 弹出框
  • 原文地址:https://www.cnblogs.com/goldenSky/p/11971422.html
Copyright © 2011-2022 走看看