zoukankan      html  css  js  c++  java
  • YARN HA Failover 导致 RM状态异常问题

    2021-11-15 18:52:15,361 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
    2021-11-15 18:52:15,372 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8032: starting
    2021-11-15 18:52:15,421 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8030: starting
    2021-11-15 18:52:19,190 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state
    2021-11-15 18:52:19,193 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
    2021-11-15 18:52:19,222 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=refreshQueues TARGET=AdminService RESULT=FAILURE DESCRIPTION=ResourceManager is not active. Can not refresh queues. PERMISSIONS=
    2021-11-15 18:52:19,222 ERROR org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed so firing fatal event
    org.apache.hadoop.ha.ServiceFailedException: ResourceManager rm1 is not Active!
    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:609)
    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
    at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
    at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813)
    at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
    2021-11-15 18:52:19,223 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
    org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
    at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
    at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813)
    at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
    Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll during transistion to Active
    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
    at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
    ... 4 more
    Caused by: org.apache.hadoop.ha.ServiceFailedException: ResourceManager rm1 is not Active!
    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:609)
    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
    ... 5 more
    2021-11-15 18:52:19,223 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
    2021-11-15 18:52:19,232 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type TRANSITION_TO_ACTIVE_FAILED. Cause:
    org.apache.hadoop.ha.ServiceFailedException: ResourceManager rm1 is not Active!
    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:609)
    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
    at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
    at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813)
    at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

    2021-11-15 18:52:19,234 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

    获取rm的状态

    命令:

    yarn rmadmin -getServiceState rm1

    yarn rmadmin -getServiceState rm2

    不能获取rm1,rm2的状态。

    https://issues.apache.org/jira/browse/YARN-2588
    https://issues.apache.org/jira/browse/YARN-2019
    https://issues.apache.org/jira/browse/YARN-2010
    Recovery失败导致导致RM无法启动:非常严重的bug。
    根源在于RM HA启动时会去zk读取之前的状态(recovery过程),如果zk中的数据有问题,recovery过程会抛出异常,
    这个异常没处理好,会直接抛到上层,导致RM进入STOPPED状态。
    进入STOPPED状态后,就不能再变成其他状态了。


    解决方法:

    不改代码的话,只能手动清除zk中的数据。可以手动删除zk中的/rmstore路径:
    setAcl /rmstore/ZKRMStateRoot world:anyone:cdrwa
    rmr /rmstore


    或者在yarn-site.xml中设置yarn.resourcemanager.zk-state-store.parent-path属性,比如/rmstore2,将数据存到另外一个路径。

  • 相关阅读:
    省、市、地区三级联动
    window.open(url); 传递参数,中文乱码问题
    TortoiseSVN打分支、合并分支、切换分支
    用SoupUI导出webservice 客户端代码遇到的问题:由于 accessExternalSchema 属性设置的限制而不允许 'file' 访问, 因此无法读取方案文档 'xjc.xsd'。
    oracle 创建定时任务
    分布式服务框架原理与实践_李林锋著_笔记
    idea部署到tomcat不打印log
    idea登录github报错 Can't login: Connection reset
    idea部署web工程到tomcat_无法访问此网站问题排查
    使用myeclipse tomcat插件部署web项目时报错 an internal error occurred during add deployment . java.lang.nullpointerexception
  • 原文地址:https://www.cnblogs.com/songyuejie/p/15566475.html
Copyright © 2011-2022 走看看