zoukankan      html  css  js  c++  java
  • 【原创】大叔问题定位分享(18)beeline连接spark thrift有时会卡住

    spark 2.1.1

    beeline连接spark thrift之后,执行use database有时会卡住,而use database 在server端对应的是 setCurrentDatabase,

    经过排查发现当时spark thrift正在执行insert操作,

    org.apache.spark.sql.hive.execution.InsertIntoHiveTable

      protected override def doExecute(): RDD[InternalRow] = {
        sqlContext.sparkContext.parallelize(sideEffectResult.asInstanceOf[Seq[InternalRow]], 1)
      }
    ...
      @transient private val externalCatalog = sqlContext.sharedState.externalCatalog
    
      protected[sql] lazy val sideEffectResult: Seq[InternalRow] = {
      ...
            externalCatalog.loadDynamicPartitions(
              externalCatalog.getPartitionOption(
              externalCatalog.loadPartition(
          externalCatalog.loadTable(

    可见insert操作中可能会调用loadDynamicPartitions、getPartitionOption、loadPartition、loadTable等方法,

    org.apache.spark.sql.hive.client.HiveClientImpl

      def loadTable(
          loadPath: String, // TODO URI
          tableName: String,
          replace: Boolean,
          holdDDLTime: Boolean): Unit = withHiveState {
    ...
      def loadPartition(
          loadPath: String,
          dbName: String,
          tableName: String,
          partSpec: java.util.LinkedHashMap[String, String],
          replace: Boolean,
          holdDDLTime: Boolean,
          inheritTableSpecs: Boolean): Unit = withHiveState {
    ...
      override def setCurrentDatabase(databaseName: String): Unit = withHiveState {

    而HiveClientImpl中对应的方法都会执行withHiveState,而withHiveState有synchronized,所以insert操作中的部分代码(比如loadPartition)和use database操作会被同步执行,当insert执行很慢时就会卡住所有的其他操作;

    spark thrift中实现原理详见 https://www.cnblogs.com/barneywill/p/10137672.html

  • 相关阅读:
    10多媒体
    胡凡-01
    概念
    算法
    07Axios
    05VueCli
    04Vue.js路由系统
    03生命周期
    《穷人思维》学习感悟
    《基金》学习感悟之二
  • 原文地址:https://www.cnblogs.com/barneywill/p/10145427.html
Copyright © 2011-2022 走看看