zoukankan      html  css  js  c++  java
  • SparkSQL访问Hive遇到的问题及解决方法

    需要先将hadoop的core-site.xml,hive的hive-site.xml拷贝到project中
    测试代码
    def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession
    .builder()
    .appName("TopNApp")
    .master("local[2]")
    .enableHiveSupport()
    .getOrCreate()
    val userClickDF = spark.table("user_click")
    userClickDF.show(10)
    }
     
    报错
    Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
    at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:869)
    at homework0522.OverwriteTopN$.main(OverwriteTopN.scala:12)
    at homework0522.OverwriteTopN.main(OverwriteTopN.scala)
     
    查看源码
    "SparkSession.scala"
    /**
    * Enables Hive support, including connectivity to a persistent Hive metastore, support for
    * Hive serdes, and Hive user-defined functions.
    *
    * @since 2.0.0
    */

    def enableHiveSupport(): Builder = synchronized {
    "在这里进行if判断的时候找不到hive class"
    if (hiveClassesArePresent) {
    config(CATALOG_IMPLEMENTATION.key, "hive")
    } else {
    throw new IllegalArgumentException(
    "Unable to instantiate SparkSession with Hive support because " +
    "Hive classes are not found.")
    }
    }

    /**
    * @return true if Hive classes can be loaded, otherwise false.
    */
    private[spark] def hiveClassesArePresent: Boolean = {
    try {
    "这里通过Class.forName去找下面的两个类,第一个类的时候就找不到了"
    Utils.classForName(HIVE_SESSION_STATE_BUILDER_CLASS_NAME)
    Utils.classForName("org.apache.hadoop.hive.conf.HiveConf")
    true
    } catch {
    case _: ClassNotFoundException | _: NoClassDefFoundError => false
    }
    }

    "发现找不到HiveSessionStateBuilder"
    private val HIVE_SESSION_STATE_BUILDER_CLASS_NAME =
    "org.apache.spark.sql.hive.HiveSessionStateBuilder"
     
    解决方法
    将$HIVE_HOME/lib下的spark-hive_2.11-2.4.2.jar与spark-hive-thriftserver_2.11-2.4.2.jar添加到project中

    继续报错
    Exception in thread "main" java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME
    at org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:194)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:285)
    at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
     
    查看源码
    "HiveUtils.scala"
    /**
    * Change time configurations needed to create a [[HiveClient]] into unified [[Long]] format.
    */
    private[hive] def formatTimeVarsForHiveClient(hadoopConf: Configuration): Map[String, String] = {
    // Hive 0.14.0 introduces timeout operations in HiveConf, and changes default values of a bunch
    // of time `ConfVar`s by adding time suffixes (`s`, `ms`, and `d` etc.). This breaks backwards-
    // compatibility when users are trying to connecting to a Hive metastore of lower version,
    // because these options are expected to be integral values in lower versions of Hive.
    //
    // Here we enumerate all time `ConfVar`s and convert their values to numeric strings according
    // to their output time units.
    Seq(
    ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY -> TimeUnit.SECONDS,
    ConfVars.METASTORE_CLIENT_SOCKET_TIMEOUT -> TimeUnit.SECONDS,
    "在这里读不到值"
    ConfVars.METASTORE_CLIENT_SOCKET_LIFETIME -> TimeUnit.SECONDS,
    ...
    ).map { case (confVar, unit) =>
    confVar.varname -> HiveConf.getTimeVar(hadoopConf, confVar, unit).toString
    }.toMap
    }
     
    进入ConfVars
    "HiveConf.java"
    public static enum ConfVars {
    SCRIPTWRAPPER("hive.exec.script.wrapper", (Object)null, ""),
    PLAN("hive.exec.plan", "", ""),
    ...
    }
     
    发现ConfVars中定义的变量并没有METASTORE_CLIENT_SOCKET_LIFETIME,而HiveConf.java来自于hive-exec-1.1.0-cdh5.7.0.jar,即证明hive1.1.0中并没有假如该参数。

    解决方法
    将hive依赖换为1.2.1

    <properties>
    ...
    <!-- <hive.version>1.1.0-cdh5.7.0</hive.version> -->
    <hive.version>1.2.1</hive.version>
    </properties>

    ...
    <dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-exec</artifactId>
    <version>${hive.version}</version>
    </dependency>
     
    继续报错
    Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
    Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    Caused by: java.lang.reflect.InvocationTargetException
    Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: connect
    Caused by: java.net.ConnectException: Connection refused: connect
     
    解决方法
    这是因为远端没有启动hive造成的,启动hive时需要配置metastore。

    $HIVE_HOME/bin/hive --service metastore &

    ————————————————
    版权声明:本文为CSDN博主「小朋友2D」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
    原文链接:https://blog.csdn.net/ct2020129/article/details/90695033

  • 相关阅读:
    RPC框架实践之:Apache Thrift
    ubuntu中安装hadoop集群
    前端开发浏览器兼容问题
    3亿(int)数据-2亿(int)数据 求差集
    mvn docker 部署 每次都需要下载包的问题
    树莓派操作记录
    mysql 实现类似开窗函数的功能
    mysql 多字段更新
    go proxy转发工作中碰到的问题
    之前项目使用的轻量的goweb框架
  • 原文地址:https://www.cnblogs.com/javalinux/p/15069715.html
Copyright © 2011-2022 走看看