概述:
RM是yarn中最重要的组件。但是只有一个RM,因此存在单点失败的问题。RM的重启有两种方式:
1.(Non-work-preserving RM restart) 不保留工作状态的重启
这种情况下,RM把应用(application)的状态保存在一个插件化的state-store里,等RM重启后,RM重新加载这些状态,然后kick之前正在执行的任务,用户不必重新提交任务。
2.(work-preserving RM restart)保留工作状态的重启
RM通过合并NM上的container状态和AM的container请求来重新任务状态。上面的情况不同的是,不需要kill之前正在执行的任务,任务在RM重启的时候可以继续执行。
特性:
Non-work-preserving RM restart:
这种方式下,RM会在client提交工作时保存应用(application)的元数据(如ApplicationSubmissionContext)到插件化的state-store中,并且在任务执行完成后保存执行状态。此外,RM还保存应用的凭证信息(security keys、tokens)等。当RM宕机后,RM可会重新加载这些保存在state-store中的元数据,并且重新提交任务(不提交RM宕机前已经执行完成的)。
nodeManagers和clients在RM宕机期间会轮询RM。RM重启后,会通过心跳(heartbeats)发出一个re-sync命令到所有的NM和AM上.NMs接收到re-sync命令后,会把自己节点上的所有containers都干掉,然后重新注册到RM(跟新的RM一样)。AMs接收到re-sync命令后,会shutdown。RM加载完元数据信息后,会为任务重建AM。在NMs和AMs接收到re-sync命令后,RM宕机时正在执行的任务就被kill掉了。
保存元数据->RM宕机->RM重启->发送re-sync到NMs->NMs kill containers,AMs shutdown->RM读取无数据->RM提交任务->RM 分配AM
Work-preserving RM restart:
RM重建YARN集群状态,最重要的是重建scheduler的的状态,包括(containers’ life-cycle, applications’ headroom and resource requests, queues’ resource usage)containers的生命周期、应用程序的headroot、资源请求、队列的资源使用情况等。RM不用杀死正在执行的程序,在RM重启后,会继续这些暂停的任务。
RM重新通过NMs发送的containers状态来重建集群。在RM宕机期间,NMs不会kill containers,并且继续维护containers的状态,在RM重启后,NMs会向RM重新注册,并发containers的状态。之后,AM需要重新发送后续的资源请求,因为在RM在宕机前可能就没有满足AM的资源请求。应用程序使用AMRMClient和RM通信而不必担心AM在RM re-synce时重新向RM请求资源。
注意:无论哪种方式,都需要一个state-store
配置:
四种state-store方式:zookeeperhdfslocal filedb;其中zookeeper支持RM HA的恢复,其它不支持HA。
Enable RM Restart
Property | Description |
---|---|
yarn.resourcemanager.recovery.enabled | true |
Configure the state-store for persisting the RM state
Property | Description |
---|---|
yarn.resourcemanager.store.class | The class name of the state-store to be used for saving application/attempt state and the credentials. The available state-store implementations areorg.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore, a ZooKeeper based state-store implementation andorg.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore, a Hadoop FileSystem based state-store implementation like HDFS and local FS. org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore, a LevelDB based state-store implementation. The default value is set to org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore. |
How to choose the state-store implementation
ZooKeeper based state-store: User is free to pick up any storage to set up RM restart, but must use ZooKeeper based state-store to support RM HA. The reason is that only ZooKeeper based state-store supports fencing mechanism to avoid a split-brain situation where multiple RMs assume they are active and can edit the state-store at the same time.
FileSystem based state-store: HDFS and local FS based state-store are supported. Fencing mechanism is not supported.
LevelDB based state-store: LevelDB based state-store is considered more light weight than HDFS and ZooKeeper based state-store. LevelDB supports better atomic operations, fewer I/O ops per state update, and far fewer total files on the filesystem. Fencing mechanism is not supported.