zoukankan      html  css  js  c++  java
  • 记一次rm故障

    现象来看与下连接相同

    http://cache.baiducontent.com/c?m=9d78d513d99401ef05ad837e7c4d8b711925d6387d9583532e8ec40884642a071d26b4e8713510758b96383416ae394bea872173474466ecc5df893acabbe53f2ef876692c4dc101528445e9dc4755d620e74de8df59b0e2a763d5f984c4de24048004543dc6abd6061715ba38ba4566a1e0c215494b57fab33f3fb91f3568882233ab5aa8bd6d3140ddad9b175bc35d8a3c51d1f269f56352ec52b31f6c7519ff51e0550d6067bc093abe037f46cfab1bbe7a644023bc4bb5b3dce1ab08d19cbd71d8a78bb82fe33bbad2ea8f27193110a963eff1eaf22a643344838a89459225bc8cb4e908ba53914b02eb002a7e2c8e2bc3dec940f21500b2b836&p=9f7ac815d9c10ebe44be9b7c4e&newp=8e36d10a85cc43ec0cbd9b7c4253d8304a02c70e3dc3864e1290c408d23f061d4862e7bf27251200d0c7786507ac425cedf4377323454df6cc8a871d81edd17c&user=baidu&fm=sc&query=error+in+dispatcher+thread+java%2Eutil%2Econcurrent%2Erejectedexecutionexception&qid=ad6bf4940002c824&p1=1

    Error:"Error in dispatcher thread java.util.concurrent.RejectedExecutionException" when running heavy load of job from YARN Resource Manager
    icocio created · 5 天前
    0
    SupportKB
    Problem Description: 
    The YARN Resource Manager (RM) with HA configured is failing  when experiencing heavy loads of jobs. Even the standby RM is crashing. Both the Standby RM and the previously active RMs are failing as well. The following error is displayed in the Resource Manager log at the moment of shutdown:
    2018-10-23 18:50:42,552 FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(190)) - 
    Error in dispatcher thread 
    java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@5407c4c8 rejected from 
    java.util.concurrent.ThreadPoolExecutor@74d60fd0[Terminated, pool size = 14147, active threads = 0, queued tasks = 0, completed tasks = 32283] 
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) 
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) 
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) 
    at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) 
    at org.apache.hadoop.registry.server.services.RegistryAdminService.submit(RegistryAdminService.java:176) 
    at org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:200) 
    at org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:170) 
    at org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.onContainerFinished(RMRegistryOperationsService.java:146) 
    at org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService.handleAppAttemptEvent(RMRegistryService.java:156) 
    at org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:188) 
    at org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:182) 
    at org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:279) 
    at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) 
    at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) 
    at java.lang.Thread.run(Thread.java:748) 
    2018-10-23 18:50:42,552 INFO capacity.ParentQueue (ParentQueue.java:assignContainers(475)) - 
    assignedContainer queue=root usedCapacity=0.78571427 absoluteUsedCapacity=0.78571427 
    used=<memory:3914240, vCores:1076> cluster=<memory:4981760, vCores:2318> 
    2018-10-23 18:50:42,559 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - 
    container_e173_1540320252022_0085_02_002570 Container Transitioned from ALLOCATED to ACQUIRED 
    2018-10-23 18:50:42,559 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - 
    container_e173_1540320252022_0085_02_002571 Container Transitioned from ALLOCATED to ACQUIRED 
    2018-10-23 21:49:24,484 INFO resourcemanager.ResourceManager (LogAdapter.java:info(45)) - STARTUP_MSG: 
    /************************************************************ 
    STARTUP_MSG: Starting ResourceManager 
    STARTUP_MSG: user = yarn 
    STARTUP_MSG: host = ustsmascmsp920.prod/10.86.128.54 
    STARTUP_MSG: args = [] 
    STARTUP_MSG: version = 2.7.3.2.6.1.0-129
    
      
    Cause: 
    Resource Manager has to purge the records under Zookeeper for every container that completes. While doing this, it scans almost all znodes from the root path. An increased number of znodes will lead to Zookeeper client session drop and causes AsyncDispatcher queue to get overwhelmed. Resource Manager might be shutting down due to a race condition. 
    Solution: 
    This issue is resolved in HDP-2.6.5. For versions prior to HDP-2.6.5, fo the following to disable ResourceManager registry: 
    1.Log into Ambari UI.
    2.Click YARN service.
    3.Click Config > Advanced tab.
    4.Expand Advanced yarn-site section.
    5.Set hadoop.registry.rm.enabled to false.
    6.Restart all affected.

    另外,rm1挂掉后,rm2也没能切换成功active,具体日志记录:

    12:35:21 rmfailover才说发现active rm 【rm2】
    12:35:56 跟zk的session 又close了
    12:37:18 感觉是rm2作为active正常了,开始报各种初始化信息
    12:37:18 916  rm2又变成standby了,随后rm为实现fence自己停止工作

    另外补充一点rm的隔离机制

    https://www.cnblogs.com/shenh062326/p/3547786.html

  • 相关阅读:
    P1967 货车运输【最大生成树+倍增LCA】!!!
    P1991 无线通讯网【kruskal】
    P2872 [USACO07DEC]Building Roads S【kruskal】
    最小生成树
    树的直径
    树的重心
    今日英语单词小结
    项目生命周期
    反射reflect(框架的基石),动态导入小技巧 | 元类 | 单例设计模式
    OOP的三大特征之多态 | 面向对象高级知识,内置魔法函数,点语法和[ ]取值的实现,运算符重载,迭代器协议,上下文管理
  • 原文地址:https://www.cnblogs.com/roger888/p/11401511.html
Copyright © 2011-2022 走看看