zoukankan      html  css  js  c++  java
  • 【原创】大叔经验分享(19)spark on yarn提交任务之后执行进度总是10%

    spark 2.1.1

    系统中希望监控spark on yarn任务的执行进度,但是监控过程发现提交任务之后执行进度总是10%,直到执行成功或者失败,进度会突然变为100%,很神奇,

     

     下面看spark on yarn任务提交过程:

    spark on yarn提交任务时会把mainClass修改为Client

    childMainClass = "org.apache.spark.deploy.yarn.Client"

    spark-submit过程详见:https://www.cnblogs.com/barneywill/p/9820684.html

    下面看Client执行过程:

    org.apache.spark.deploy.yarn.Client

      def main(argStrings: Array[String]) {
    ...
        val sparkConf = new SparkConf
        // SparkSubmit would use yarn cache to distribute files & jars in yarn mode,
        // so remove them from sparkConf here for yarn mode.
        sparkConf.remove("spark.jars")
        sparkConf.remove("spark.files")
        val args = new ClientArguments(argStrings)
        new Client(args, sparkConf).run()
    ...
    
      def run(): Unit = {
        this.appId = submitApplication()
    ...
    
      def submitApplication(): ApplicationId = {
    ...
          val containerContext = createContainerLaunchContext(newAppResponse)
    ...
    
      private def createContainerLaunchContext(newAppResponse: GetNewApplicationResponse)
        : ContainerLaunchContext = {
    ...
        val amClass =
          if (isClusterMode) {
            Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
          } else {
            Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
          }

    这里调用过程为Client.main->run->submitApplication->createContainerLaunchContext,然后会设置amClass,最终都会调用到ApplicationMaster,因为ExecutorLauncher内部也是调用ApplicationMaster,如下:

    org.apache.spark.deploy.yarn.ExecutorLauncher

    object ExecutorLauncher {
    
      def main(args: Array[String]): Unit = {
        ApplicationMaster.main(args)
      }
    
    }

    下面看ApplicationMaster:

    org.apache.spark.deploy.yarn.ApplicationMaster

      def main(args: Array[String]): Unit = {
    ...
        SparkHadoopUtil.get.runAsSparkUser { () =>
          master = new ApplicationMaster(amArgs, new YarnRMClient)
          System.exit(master.run())
        }
    ...
    
      final def run(): Int = {
    ...
          if (isClusterMode) {
            runDriver(securityMgr)
          } else {
            runExecutorLauncher(securityMgr)
          }
    ...
    
      private def registerAM(
          _sparkConf: SparkConf,
          _rpcEnv: RpcEnv,
          driverRef: RpcEndpointRef,
          uiAddress: String,
          securityMgr: SecurityManager) = {
    ...
        allocator = client.register(driverUrl,
          driverRef,
          yarnConf,
          _sparkConf,
          uiAddress,
          historyAddress,
          securityMgr,
          localResources)
    
        allocator.allocateResources()
        reporterThread = launchReporterThread()
    ...
      private def launchReporterThread(): Thread = {
        // The number of failures in a row until Reporter thread give up
        val reporterMaxFailures = sparkConf.get(MAX_REPORTER_THREAD_FAILURES)
    
        val t = new Thread {
          override def run() {
            var failureCount = 0
            while (!finished) {
              try {
                if (allocator.getNumExecutorsFailed >= maxNumExecutorFailures) {
                  finish(FinalApplicationStatus.FAILED,
                    ApplicationMaster.EXIT_MAX_EXECUTOR_FAILURES,
                    s"Max number of executor failures ($maxNumExecutorFailures) reached")
                } else {
                  logDebug("Sending progress")
                  allocator.allocateResources()
                }
    ...

    这里调用过程为ApplicationMaster.main->run,run中会调用runDriver或者runExecutorLauncher,最终都会调用到registerAM,其中会调用YarnAllocator.allocateResources,然后在launchReporterThread中会启动一个thread,其中也会不断调用YarnAllocator.allocateResources,下面看YarnAllocator:

    org.apache.spark.deploy.yarn.YarnAllocator

      def allocateResources(): Unit = synchronized {
        updateResourceRequests()
    
        val progressIndicator = 0.1f
        // Poll the ResourceManager. This doubles as a heartbeat if there are no pending container
        // requests.
        val allocateResponse = amClient.allocate(progressIndicator)

    可见这里会设置进度为0.1,即10%,而且是硬编码,所以spark on yarn的执行进度一直为10%,所以想监控spark on yarn的任务进度看来是徒劳的;

  • 相关阅读:
    解决Django在mariadb创建的表插入中文乱码的问题
    运行在CentOS7.5上的Django项目时间不正确问题
    获取百度网盘真实下载连接
    Django2.x版本在生成数据库表初始化文件报错
    Pycharm中的Django项目连接mysql数据库
    Django2.x版本路由系统的正则写法以及视图函数的返回问题
    CentOS7.5安装坚果云
    CentOS7.5安装下载工具
    CentOS6.5修改/etc/pam.d/sshd后root无法ssh登陆
    oracle 时间
  • 原文地址:https://www.cnblogs.com/barneywill/p/10250694.html
Copyright © 2011-2022 走看看