zoukankan      html  css  js  c++  java
  • [Spark] DataFram读取JSON文件异常 出现 Since Spark 2.3, the queries from raw JSON/CSV files are disallowed...

    在IDEA中运行Scala脚本访问执行SparkSQL时:

    df.show()

    出现报错信息:

     1 19/12/06 14:26:17 INFO SparkContext: Created broadcast 2 from show at Student.scala:16
     2 Exception in thread "main" org.apache.spark.sql.AnalysisException: Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
     3 referenced columns only include the internal corrupt record column
     4 (named _corrupt_record by default). For example:
     5 spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()
     6 and spark.read.schema(schema).json(file).select("_corrupt_record").show().
     7 Instead, you can cache or save the parsed results and then send the same query.
     8 For example, val df = spark.read.schema(schema).json(file).cache() and then
     9 df.filter($"_corrupt_record".isNotNull).count().;
    10     at org.apache.spark.sql.execution.datasources.json.JsonFileFormat.buildReader(JsonFileFormat.scala:120)
    11     at org.apache.spark.sql.execution.datasources.FileFormat$class.buildReaderWithPartitionValues(FileFormat.scala:129)
    12     at org.apache.spark.sql.execution.datasources.TextBasedFileFormat.buildReaderWithPartitionValues(FileFormat.scala:165)
    13     at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:309)
    14     at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:305)
    15     at org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:327)
    16     at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
    17     at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    18     at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    19     at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    20     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    21     at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    22     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    23     at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
    24     at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
    25     at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
    26     at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3389)
    27     at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)
    28     at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)
    29     at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
    30     at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
    31     at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
    32     at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
    33     at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
    34     at org.apache.spark.sql.Dataset.head(Dataset.scala:2550)
    35     at org.apache.spark.sql.Dataset.take(Dataset.scala:2764)
    36     at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)
    37     at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)
    38     at org.apache.spark.sql.Dataset.show(Dataset.scala:751)
    39     at org.apache.spark.sql.Dataset.show(Dataset.scala:710)
    40     at org.apache.spark.sql.Dataset.show(Dataset.scala:719)
    41     at Student$.main(Student.scala:16)
    42     at Student.main(Student.scala)

    因为我的JSON格式是多行的,只需要改为一行即可

    {
      "name": "Michael",
      "age": 12
    }
    {
      "name": "Andy",
      "age": 13
    }
    {
      "name": "Justin",
      "age": 8
    }

    修改为:

    {"name": "Michael",  "age": 12}
    {"name": "Andy",  "age": 13}
    {"name": "Justin",  "age": 8}
  • 相关阅读:
    Eclipse下安装Pydev以及Helloworld实例 分类: Python 2015-07-23 23:30 29人阅读 评论(0) 收藏
    Eclipse下安装Pydev以及Helloworld实例
    Selenium学习笔记之013:控制滚动条到底部
    iOS开发Embedded dylibs/frameworks are only supported on iOS 8.0 and later for architecture armv7的解决方法
    UIView 和 CALayer 的区别和联系。
    iOS 防止数组越界的解决方法
    iOS中集成ijkplayer视频直播框架
    解析数据时,快速查看当前需要创建的数据模型的所有属性,不用每个都写,直接打印粘贴
    iOS开发中,能够方便使用的Xcode插件
    UIButton图片拉伸方法(很多需要按钮的地方我们只需要一张小图来进行缩放)
  • 原文地址:https://www.cnblogs.com/x-you/p/11995099.html
Copyright © 2011-2022 走看看