zoukankan      html  css  js  c++  java
  • 【原创】大叔问题定位分享(18)beeline连接spark thrift有时会卡住

    spark 2.1.1

    beeline连接spark thrift之后,执行use database有时会卡住,而use database 在server端对应的是 setCurrentDatabase,

    经过排查发现当时spark thrift正在执行insert操作,

    org.apache.spark.sql.hive.execution.InsertIntoHiveTable

      protected override def doExecute(): RDD[InternalRow] = {
        sqlContext.sparkContext.parallelize(sideEffectResult.asInstanceOf[Seq[InternalRow]], 1)
      }
    ...
      @transient private val externalCatalog = sqlContext.sharedState.externalCatalog
    
      protected[sql] lazy val sideEffectResult: Seq[InternalRow] = {
      ...
            externalCatalog.loadDynamicPartitions(
              externalCatalog.getPartitionOption(
              externalCatalog.loadPartition(
          externalCatalog.loadTable(

    可见insert操作中可能会调用loadDynamicPartitions、getPartitionOption、loadPartition、loadTable等方法,

    org.apache.spark.sql.hive.client.HiveClientImpl

      def loadTable(
          loadPath: String, // TODO URI
          tableName: String,
          replace: Boolean,
          holdDDLTime: Boolean): Unit = withHiveState {
    ...
      def loadPartition(
          loadPath: String,
          dbName: String,
          tableName: String,
          partSpec: java.util.LinkedHashMap[String, String],
          replace: Boolean,
          holdDDLTime: Boolean,
          inheritTableSpecs: Boolean): Unit = withHiveState {
    ...
      override def setCurrentDatabase(databaseName: String): Unit = withHiveState {

    而HiveClientImpl中对应的方法都会执行withHiveState,而withHiveState有synchronized,所以insert操作中的部分代码(比如loadPartition)和use database操作会被同步执行,当insert执行很慢时就会卡住所有的其他操作;

    spark thrift中实现原理详见 https://www.cnblogs.com/barneywill/p/10137672.html

  • 相关阅读:
    ACM-超级楼梯
    clientt.c
    call.c
    answer.c
    aa.c
    client.c
    service.c
    自己动手开发jQuery插件
    apache-commons-net Ftp 进行文件、文件夹的上传下载及日志的输出
    在Eclipse中制作SSH配置文件提示插件
  • 原文地址:https://www.cnblogs.com/barneywill/p/10145427.html
Copyright © 2011-2022 走看看