zoukankan      html  css  js  c++  java
  • spark yarn任务的executor 无故 timeout之原因分析

    问题:

             用  spark-submit --master yarn --deploy-mode cluster --driver-memory 2G --num-executors 6 --executor-memory 2G ~~~

    提交任务时,最后一个executor 执行时间 超过了 160s 导致 timeout而退出,造成任务重新执行造成用时过长。具体请看下面介绍:

    17/01/13 09:13:08 WARN spark.HeartbeatReceiver: Removing executor 5 with no recent heartbeats: 161684 ms exceeds timeout 120000 ms
    17/01/13 09:13:08 ERROR cluster.YarnClusterScheduler: Lost executor 5 on slave10: Executor heartbeat timed out after 161684 ms
    17/01/13 09:13:08 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, slave10): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 161684 ms
    17/01/13 09:13:08 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch 0)
    17/01/13 09:13:08 INFO cluster.YarnClusterSchedulerBackend: Requesting to kill executor(s) 5
    17/01/13 09:13:08 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 0.0 (TID 5, slave06, partition 0,RACK_LOCAL, 8029 bytes)
    17/01/13 09:13:08 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 5 from BlockManagerMaster.
    17/01/13 09:13:08 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(5, slave10, 34439)
    17/01/13 09:13:08 INFO storage.BlockManagerMaster: Removed 5 successfully in removeExecutor
    17/01/13 09:13:08 INFO scheduler.DAGScheduler: Host added was in lost list earlier: slave10
    17/01/13 09:13:08 INFO yarn.ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) 5.
    17/01/13 09:13:08 INFO scheduler.TaskSetManager: Finished task 0.1 in stage 0.0 (TID 5) in 367 ms on slave06 (5/5)
    17/01/13 09:13:08 INFO scheduler.DAGScheduler: ResultStage 0 (saveAsNewAPIHadoopFile at DataFrameFunctions.scala:55) finished in 162.495 s

    初步估计是 因为最后一步用到的计算多,但是 spark的堆外内存配置低 如下所示
    spark.yarn.executor.memoryOverhead executorMemory * 0.10, with minimum of 384
    故加大配置,如下:
    spark-submit --master yarn --deploy-mode cluster --driver-memory 2G --num-executors 6 --executor-memory 2G --conf spark.yarn.executor.memoryOverhead=512 --conf spark.yarn.driver.memoryOverhead=512

    经测试上述问题不复存在!
     
  • 相关阅读:
    在zookeeper集群的基础上,搭建伪solrCloud集群
    Spring Data Solr操作solr的简单案例
    solr的客户端操作:使用solrj进行curd操作
    solr配置相关:约束文件及引入ik分词器
    solr的简单部署:在tomcat中启动slor
    Lucene的查询及高级内容
    淘淘商城部署文档
    服务器负载均衡的部署方式
    反向代理和负载均衡有何区别?
    毕向东_Java基础视频教程第21天_IO流(1)
  • 原文地址:https://www.cnblogs.com/RichardYD/p/6281745.html
Copyright © 2011-2022 走看看