接着作业提交详解(上)继续写:在上一篇(hadoop2.7之作业提交详解(上))中已经讲到了YARNRunner.submitJob()
[WordCount.main() -> Job.waitForCompletion() -> Job.submit() -> Job.connect() -> Cluster.Cluster() -> Cluster.initialize() -> YarnClientProtocolProvider.create() -> JobSubmitter.sbumitJobInternal() -> YARNRunner.submitJob()]
那么现在接着从YARNRunner.submitJob()开始说:
先简单看一下YARNRunner这个类(摘录一部分):
package org.apache.hadoop.mapred; public class YARNRunner implements ClientProtocol { private ResourceMgrDelegate resMgrDelegate; //这是RM派驻在“地方”上的特派员 private ClientCache clientCache; private Configuration conf; private final FileContext defaultFileContext; public YARNRunner(Configuration conf) {//构造函数,需要创建特派员,然后调用下一个构造函数 this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf))); } public YARNRunner(Configuration conf, ResourceMgrDelegate resMgrDelegate) {//需要创建ClientCache this(conf, resMgrDelegate, new ClientCache(conf, resMgrDelegate)); } public YARNRunner(Configuration conf, ResourceMgrDelegate resMgrDelegate, ClientCache clientCache) {//这是最终的构造函数 this.conf = conf; try { this.resMgrDelegate = resMgrDelegate; this.clientCache = clientCache; this.defaultFileContext = FileContext.getFileContext(this.conf); } catch (UnsupportedFileSystemException ufe) { throw new RuntimeException("Error in instantiating YarnClient", ufe); } } public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts) throws IOException, InterruptedException { addHistoryToken(ts);//用于为历史记录服务,与“作业历史(JobHistory)”有关 // Construct necessary information to start the MR AM //构建MR AM的必要启动信息 //创建一个ApplicationSubmissionContext,并将conf中的相关信息转移过去 ApplicationSubmissionContext appContext = createApplicationSubmissionContext(conf, jobSubmitDir, ts); // Submit to ResourceManager try { /* 将作业提交给资源管理者(ResourceManager)*/ //RM受理了所提交的作业以后,会把这个ContainerLaunchContext转发到某个NM节点 //上,在那里执行这个shell命令行,另起一个Java虚拟机,让它执行MRAppMaster.class。 //由此可见,这个ApplicationSubmissionContext对象appContext,真的是“代表着ResourceManager //为发起该应用的ApplicationMaster所需的全部信息” ApplicationId applicationId = resMgrDelegate.submitApplication(appContext); ApplicationReport appMaster = resMgrDelegate .getApplicationReport(applicationId); String diagnostics = (appMaster == null ? "application report is null" : appMaster.getDiagnostics()); if (appMaster == null || appMaster.getYarnApplicationState() == YarnApplicationState.FAILED || appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) { throw new IOException("Failed to run job : " + diagnostics); } return clientCache.getClient(jobId).getJobStatus(jobId); } catch (YarnException e) { throw new IOException(e); } } }
其中createApplicationSubmissionContext方法的作用:
1、设置资源:默认内存为1536M,cpu的core为1
2、设置本地资源,比如临时工作目录,jar包等
3、设置安全票据tokens
4、设置启动AM的命令
5、检查map和reduce的配置信息
6、设置环境CLASSPATH等
7、为AM的container设置ContainerLaunchContext
8、设置ApplicationSubmissionContext
9、设置MRAppMaster的执行路径
并把配置块conf中当前的相关信息、已上传资料所在的目录路径以及有关身份和访问权限的信息都复制转移过去。提供了有关ApplicationMaster即“项目组长”该用哪一个Shell(例如bash)以及有关某些环境变量的信息。再如作业的名称等。
接下来就是调用ResourceMgrDelegate.submitApplication方法:(所以我们先看一下ResourceMgrDelegate这个类)
public class ResourceMgrDelegate extends YarnClient { private YarnConfiguration conf; private ApplicationSubmissionContext application; private ApplicationId applicationId; protected YarnClient client;//实际上是YarnClientImpl类的对象,那也是对YarnClient的继承和扩展 private Text rmDTService; //这是ResourceMgrDelegate的构造方法 public ResourceMgrDelegate(YarnConfiguration conf) { super(ResourceMgrDelegate.class.getName()); this.conf = conf; //创建YarnClient对象client //YarnClient.createYarnClient()创建的是YarnClientImpl this.client = YarnClient.createYarnClient(); init(conf);//这是由AbstractService类提供的,YarnClient是对AbstractService的扩展 start();//这也是由AbstractService类提供的 } public ApplicationId submitApplication(ApplicationSubmissionContext appContext) throws YarnException, IOException { return client.submitApplication(appContext);//调用YarnClientImpl.submitApplication方法 }
从前面所有的代码中我们可以得知:
ResourceMgrDelegate对象是在YARNRunner的构造函数中创建的。而YARNRunner,则是在前面的Cluster.Initialize()中创建的。再往上追溯,则Cluster类对象是在首次调用connect()时创建的。所以,任何一个节点,只要曾经调用过connect(),即曾经与“集群”有过连接,节点上就会有个Cluster类对象,从而就会有个YARNRunner对象,也就会有个ResourceMgrDelegate对象,而且如下所述就会有个YarnClientImpl对象。
现在为止,我们的作业提交路径是:
[WordCount.main() -> Job.waitForCompletion() -> Job.submit() -> Job.connect() -> Cluster.Cluster() -> Cluster.initialize() -> YarnClientProtocolProvider.create() -> JobSubmitter.sbumitJobInternal() -> YARNRunner.submitJob() -> ResourceMgrDelegate.submitApplication() -> YarnClientImpl.submitApplication()]
解下来我们继续看YarnClientImpl.submitApplication()方法:
public ApplicationId submitApplication(ApplicationSubmissionContext appContext) throws YarnException, IOException { ApplicationId applicationId = appContext.getApplicationId(); if (applicationId == null) { throw new ApplicationIdNotProvidedException( "ApplicationId is not provided in ApplicationSubmissionContext"); } //创建一个SubmitApplicationRequestPBImpl类的记录块 SubmitApplicationRequest request = Records.newRecord(SubmitApplicationRequest.class); request.setApplicationSubmissionContext(appContext);//设置好记录块中的Context // Automatically add the timeline DT into the CLC // Only when the security and the timeline service are both enabled if (isSecurityEnabled() && timelineServiceEnabled) { addTimelineDelegationToken(appContext.getAMContainerSpec()); } //TODO: YARN-1763:Handle RM failovers during the submitApplication call. rmClient.submitApplication(request);//实际的跨节点提交 int pollCount = 0; long startTime = System.currentTimeMillis(); EnumSet<YarnApplicationState> waitingStates = EnumSet.of(YarnApplicationState.NEW, YarnApplicationState.NEW_SAVING, YarnApplicationState.SUBMITTED); EnumSet<YarnApplicationState> failToSubmitStates = EnumSet.of(YarnApplicationState.FAILED, YarnApplicationState.KILLED); while (true) { try { //获取来自RM节点的应用状态报告,从中获取本应用的当前状态 ApplicationReport appReport = getApplicationReport(applicationId); YarnApplicationState state = appReport.getYarnApplicationState(); if (!waitingStates.contains(state)) { if(failToSubmitStates.contains(state)) { throw new YarnException("Failed to submit " + applicationId + " to YARN : " + appReport.getDiagnostics()); } LOG.info("Submitted application " + applicationId); break;//作业已进入运行阶段,结束while循环 } long elapsedMillis = System.currentTimeMillis() - startTime; if (enforceAsyncAPITimeout() && elapsedMillis >= asyncApiPollTimeoutMillis) { throw new YarnException("Timed out while waiting for application " + applicationId + " to be submitted successfully"); } // Notify the client through the log every 10 poll, in case the client // is blocked here too long. if (++pollCount % 10 == 0) { LOG.info("Application submission is not finished, " + "submitted application " + applicationId + " is still in " + state); } try { Thread.sleep(submitPollIntervalMillis); } catch (InterruptedException ie) { String msg = "Interrupted while waiting for application " + applicationId + " to be successfully submitted."; LOG.error(msg); throw new YarnException(msg, ie); } } catch (ApplicationNotFoundException ex) { // FailOver or RM restart happens before RMStateStore saves // ApplicationState LOG.info("Re-submit application " + applicationId + "with the " + "same ApplicationSubmissionContext"); rmClient.submitApplication(request);//失败后的再次提交 } } return applicationId; }
从上看来只要是调用了rmClient.submitApplication(request)方法,那这儿rmClient又是个什么呢?我们接着来看一下YarnClientImpl这个类的简单定义:
public class YarnClientImpl extends YarnClient { private static final Log LOG = LogFactory.getLog(YarnClientImpl.class); protected ApplicationClientProtocol rmClient; protected long submitPollIntervalMillis; private long asyncApiPollIntervalMillis; private long asyncApiPollTimeoutMillis; protected AHSClient historyClient; private boolean historyServiceEnabled; protected TimelineClient timelineClient; @VisibleForTesting Text timelineService; @VisibleForTesting String timelineDTRenewer; protected boolean timelineServiceEnabled; protected boolean timelineServiceBestEffort; private static final String ROOT = "root"; public YarnClientImpl() { super(YarnClientImpl.class.getName()); }
从上可以看出rmClient是一个ApplicationClientProtocol对象,这个又是一个接口,具体的实现类是ApplicationClientProtocolPBClientImpl ,接下来我们看一下这个类:
public class ApplicationClientProtocolPBClientImpl implements ApplicationClientProtocol, Closeable { private ApplicationClientProtocolPB proxy; public ApplicationClientProtocolPBClientImpl(long clientVersion, InetSocketAddress addr, Configuration conf) throws IOException { //将配置项“rpc.engine.ApplicationClientProtocolPB”设置成ProtobufRpcEngine RPC.setProtocolEngine(conf, ApplicationClientProtocolPB.class, ProtobufRpcEngine.class); //创建proxy //这个proxy存在于用户为提交运行具体应用而起的那个JVM上,它既不属于 //ResourceManager,也不属于NodeManager,而是一个独立的Java虚拟机,可以是在集群//内的任何一台机器上 proxy = RPC.getProxy(ApplicationClientProtocolPB.class, clientVersion, addr, conf); } public SubmitApplicationResponse submitApplication( SubmitApplicationRequest request) throws YarnException, IOException { //从请求request中取出其协议报文(message)部分 SubmitApplicationRequestProto requestProto = ((SubmitApplicationRequestPBImpl) request).getProto(); try { //交由proxy将报文发送出去,并等候服务端回应 //将服务端回应包装成SubmitApplicationResponsePBImpl对象 return new SubmitApplicationResponsePBImpl(proxy.submitApplication(null, requestProto)); } catch (ServiceException e) { RPCUtil.unwrapAndThrowException(e); return null; } } }
ApplicationClientProtocolPBClientImpl的submitApplication方法,在其里面就是调用proxy.submitApplication方法,而proxy是在构造函数中创建的。
通过proxy发出的SubmitApplicationRequest,是以RM节点为目标的,最终经由操作系统提供的网络传输层以TCP报文的方式送达RM所在节点机上的对等层,那上面是
ProtoBuf,它会从TCP报文中还原出对端所发送的对象。再往上,那就是同样也实现了ApplicationClientProtocolPB界面的ApplicationClientProtocolPBServiceImpl,ProtoBuf这一
层会根据对方请求直接就调用其submitApplication()。这样,Client一侧对于ApplicationClientProtocolPBClientImpl所提供函数的调用就转化成Server一侧对于applicationClientProtocolPBServiceImpl所提供的对应函数的调用。当然,Server一侧函数调用的返回值也会转化成Client一侧的返回值,这就实现了远程过程调用RPC。不言而喻,Client/Server双方的这两个对象必须提供对同一个界面的实现,在这里就是ApplicationClientProtocolPB。
Client端
YARNRunner.submitJob() //这是处于顶层的应用层
ResourceMgrDelegate.submitApplication() //这是RM的代理
YarnClientImpl.submitApplication() //YARN框架的Client一侧
ApplicationClientProtocolPBClientImpl.submitApplication()//ApplicationClientProtocol界面
proxy.submitApplication() //ApplicationClientProtocolPB界面
Protocol内部实现的submitApplication() //在TCP/IP的基础上发送应用层的请求
Socket和TCP/IP //这是网络连接的最低层
Server端:
Server这一边就不同了。在Server这一边,结构的层次和函数调用的层次是相反的,结构上处于最底层的Socket和TCP/IP反倒处于函数调用栈的最高层,愈往下调用实质上就愈往结构上的高层走。这是因为TCP/IP报文最初到达的是底层,然后逐层往上递交的过程一般都是通过函数调用实现的,所以层层往下调用的过程反倒变成了层层往上递交的过程。
那么接下来就是通过tcp/ip调用服务端ApplicationClientProtocolPBServiceImpl.submitApplication()方法;
public class ApplicationClientProtocolPBServiceImpl implements ApplicationClientProtocolPB { private ApplicationClientProtocol real; public ApplicationClientProtocolPBServiceImpl(ApplicationClientProtocol impl) { this.real = impl; } public SubmitApplicationResponseProto submitApplication(RpcController arg0, SubmitApplicationRequestProto proto) throws ServiceException { SubmitApplicationRequestPBImpl request = new SubmitApplicationRequestPBImpl(proto);//创建一个请求 try { SubmitApplicationResponse response = real.submitApplication(request); //real为ClientRMService类对象 ,该对象在RM初始化时由createClientRMService() 方法创建 return ((SubmitApplicationResponsePBImpl)response).getProto(); } catch (YarnException e) { throw new ServiceException(e); } catch (IOException e) { throw new ServiceException(e); } } }
接下来调用ClientRMService.submitApplication(request); 方法
public SubmitApplicationResponse submitApplication( SubmitApplicationRequest request) throws YarnException { ApplicationSubmissionContext submissionContext = request .getApplicationSubmissionContext(); ApplicationId applicationId = submissionContext.getApplicationId(); // ApplicationSubmissionContext needs to be validated for safety - only // those fields that are independent of the RM's configuration will be // checked here, those that are dependent on RM configuration are validated // in RMAppManager. String user = null; try { // Safety user = UserGroupInformation.getCurrentUser().getShortUserName();//获取用户 } catch (IOException ie) { LOG.warn("Unable to get the current user.", ie); RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST, ie.getMessage(), "ClientRMService", "Exception in submitting application", applicationId); throw RPCUtil.getRemoteException(ie); } // Check whether app has already been put into rmContext, // If it is, simply return the response //判断作业是否已经存在,如果是则直接返回实例 if (rmContext.getRMApps().get(applicationId) != null) { LOG.info("This is an earlier submitted application: " + applicationId); return SubmitApplicationResponse.newInstance(); } //如果没有设置队列,则使用默认队列 if (submissionContext.getQueue() == null) { submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME); } //如果没有设置application名字,则使用默认的命名规则 if (submissionContext.getApplicationName() == null) { submissionContext.setApplicationName( YarnConfiguration.DEFAULT_APPLICATION_NAME); } //如果没有指定提交类型,则指定默认为yarn模式 if (submissionContext.getApplicationType() == null) { submissionContext .setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE); } else { if (submissionContext.getApplicationType().length() > YarnConfiguration.APPLICATION_TYPE_LENGTH) { submissionContext.setApplicationType(submissionContext .getApplicationType().substring(0, YarnConfiguration.APPLICATION_TYPE_LENGTH)); } } try { // call RMAppManager to submit application directly rmAppManager.submitApplication(submissionContext, System.currentTimeMillis(), user);//提交作业到rmAppManager手中 LOG.info("Application with id " + applicationId.getId() + " submitted by user " + user); RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST, "ClientRMService", applicationId); } catch (YarnException e) { LOG.info("Exception in submitting application with id " + applicationId.getId(), e); RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST, e.getMessage(), "ClientRMService", "Exception in submitting application", applicationId); throw e; } SubmitApplicationResponse response = recordFactory .newRecordInstance(SubmitApplicationResponse.class); return response; }
从作业提交的角度看,一旦进入了 RM 节点上的RMAppManagers. ubmitApplication(),作业的提交就已完成。 至于这以后的处理,那是 RM的事了,作业提交的最终流程就是:
[WordCount.main() -> Job.waitForCompletion() -> Job.submit() -> Job.connect() -> Cluster.Cluster() -> Cluster.initialize() -> YarnClientProtocolProvider.create() -> JobSubmitter.sbumitJobInternal() -> YARNRunner.submitJob() -> ResourceMgrDelegate.submitApplication() -> YarnClientImpl.submitApplication() -> ApplicationClientProtocolPBClientImpl.submitApplication() -> ApplicationClientProtocolPBServiceImpl.submitApplication() -> ClientRMService.submitApplication() -> RMAppManager.submitApplication() ]