zoukankan      html  css  js  c++  java
  • 【原创】大叔经验分享(55)spark连接kudu报错

    spark-2.4.2
    kudu-1.7.0


    开始尝试

    1)自己手工将jar加到classpath

    spark-2.4.2-bin-hadoop2.6
    +
    kudu-spark2_2.11-1.7.0-cdh5.16.1.jar

    # bin/spark-shell
    scala> val df = spark.read.options(Map("kudu.master" -> "master:7051", "kudu.table" -> "impala::test.tbl_test")).format("kudu").load
    java.lang.ClassNotFoundException: Failed to find data source: kudu. Please find packages at http://spark.apache.org/third-party-projects.html
      at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:660)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
      ... 49 elided
    Caused by: java.lang.ClassNotFoundException: kudu.DefaultSource
      at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:72)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:634)
      at scala.util.Try$.apply(Try.scala:213)
      at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:634)
      at scala.util.Failure.orElse(Try.scala:224)
      at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
      ... 51 more

    2)采用官方的方式(将kudu版本改为1.7.0)

    spark-2.4.2-bin-hadoop2.6

    # bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.7.0

    same error

    3)采用官方的方式(不修改)

    spark-2.4.2-bin-hadoop2.6

    # bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.9.0
    scala> val df = spark.read.options(Map("kudu.master" -> "master:7051", "kudu.table" -> "impala::test.tbl_test")).format("kudu").load
    java.lang.NoClassDefFoundError: scala/Product$class
      at org.apache.kudu.spark.kudu.Upsert$.<init>(OperationType.scala:41)
      at org.apache.kudu.spark.kudu.Upsert$.<clinit>(OperationType.scala)
      at org.apache.kudu.spark.kudu.DefaultSource$$anonfun$getOperationType$2.apply(DefaultSource.scala:217)
      at org.apache.kudu.spark.kudu.DefaultSource$$anonfun$getOperationType$2.apply(DefaultSource.scala:217)
      at scala.Option.getOrElse(Option.scala:138)
      at org.apache.kudu.spark.kudu.DefaultSource.getOperationType(DefaultSource.scala:217)
      at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:104)
      at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:87)
      at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
      at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
      ... 49 elided
    Caused by: java.lang.ClassNotFoundException: scala.Product$class
      at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      ... 61 more

    看起来是scala版本冲突,到spark下载页面发现一句话:

    Note that, Spark is pre-built with Scala 2.11 except version 2.4.2, which is pre-built with Scala 2.12.

    4)kudu-spark改为scala2.12

    spark-2.4.2-bin-hadoop2.6

    # bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.12:1.9.0

            ::::::::::::::::::::::::::::::::::::::::::::::
    
            ::          UNRESOLVED DEPENDENCIES         ::
    
            ::::::::::::::::::::::::::::::::::::::::::::::
    
            :: org.apache.kudu#kudu-spark2_2.12;1.9.0: not found
    
            ::::::::::::::::::::::::::::::::::::::::::::::

    好吧,下载2.4.3

    5)采用官方的方式(继续)

    spark-2.4.3-bin-hadoop2.6

    # bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.9.0
    scala> val df = spark.read.options(Map("kudu.master" -> "master:7051", "kudu.table" -> "impala::test.tbl_test")).format("kudu").load
    df: org.apache.spark.sql.DataFrame = [order_no: string, id: bigint ... 28 more fields]

    正常了

    6)采用官方的方式(将kudu版本改为1.7.0)

    spark-2.4.3-bin-hadoop2.6

    # bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.7.0

    same error

    看来spark连接kudu只能采用scala2.11+kudu-spark2_2.11:1.9.0

    参考:
    https://kudu.apache.org/docs/developing.html
    http://spark.apache.org/downloads.html

  • 相关阅读:
    3500常用汉字与标点符号(已排除不支持GB2312的)
    http报头正文开头会有一个整数的问题
    Arduino "Card failed, or not present"(即找不到SD卡)错误解决方案
    Arduino运行时突然[卡死在某一行/立即重启/串口输出乱码/程序执行不正常]的可能原因
    C++编程常见错误
    本地Apache服务器访问时502 Server dropped connection 错误解决方法
    Borůvka (Sollin) 算法求 MST 最小生成树
    搜索算法总结:迭代加深、双向、启发式
    三分法
    状压 DP:[USACO06NOV] Corn Fields,[USACO13NOV] No Change
  • 原文地址:https://www.cnblogs.com/barneywill/p/10840608.html
Copyright © 2011-2022 走看看