zoukankan      html  css  js  c++  java
  • Samza在YARN上的启动过程 =》 之二 submitApplication

    首先,来看怎么构造一个org.apache.hadoop.yarn.client.api.YarnClient

    class ClientHelper(conf: Configuration) extends Logging {
      val yarnClient = YarnClient.createYarnClient
      info("trying to connect to RM %s" format conf.get(YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS))
      yarnClient.init(conf);
      yarnClient.start
    

      !!!这个client还有个start方法,看来它跟RM很谈得来。的确,它实现了service这个接口。 好吧,它是一个服务。在YarnJobFactory中,我们用yarn-site.xml构造了一个YarnConfiguration对象,现在用它来初始化YarnClient,因为我们至少需要RM在哪,对不?

    下边分几部分看submitApplication方法的实现

    第一次调用YarnClient - 获取信息

    def submitApplication(packagePath: Path, memoryMb: Int, cpuCore: Int, cmds: List[String], env: Option[Map[String, String]], name: Option[String]): Option[ApplicationId] = {
        val app = yarnClient.createApplication
        val newAppResponse = app.getNewApplicationResponse
        var mem = memoryMb
        var cpu = cpuCore
    
        // If we are asking for memory more than the max allowed, shout out
        if (mem > newAppResponse.getMaximumResourceCapability().getMemory()) {
          throw new SamzaException("You're asking for more memory (%s) than is allowed by YARN: %s" format
            (mem, newAppResponse.getMaximumResourceCapability().getMemory()))
        }
    
        // If we are asking for cpu more than the max allowed, shout out
        if (cpu > newAppResponse.getMaximumResourceCapability().getVirtualCores()) {
          throw new SamzaException("You're asking for more CPU (%s) than is allowed by YARN: %s" format
            (cpu, newAppResponse.getMaximumResourceCapability().getVirtualCores()))
        }
    
        appId = Some(newAppResponse.getApplicationId)
    

      首先通过yarnClient的createApplication方法获取一个YarnClientApplication对象。这是对RM的第一次请求,那么这次请求能得到什么信息呢?

    通过这次请求得到的YarnClientApplication对象有两个方法:

    1. getApplicationSubmissionContext() , 它返回一个 ApplicationSubmissionContext对象。“ApplicationSubmissionContext represents all of the information needed by the ResourceManager to launch the ApplicationMaster for an application.”
    2. getNewApplicationResponse(), 它返回一个GetNewApplicationResponse对象。

    鉴于YarnClient的createApplication方法没有任何参数,而YarnClient本身的状态中由用户指定的部分只是YarnConfiguration的内容,因此这个createApplication方法并不会告诉YARN客户端对资源的需求,因此它返回的app对象只包含了yarn的RM本身的信息。

    在获取了app这个对象之后,submitApplication方法通过

     val newAppResponse = app.getNewApplicationResponse

    从中取出了newAppResponse这个对象,然后从中取出了当前YARN集群最多支持的内存和CPU数目(TODO:这个值是当前可用的资源的值,还是整体上最大资源值)。然后对比给AM申请的container想要的内存和CPU,如果超出了YARN支持的最大值,就抛出异常。

    否则,就把从newAppResponse中获取的applicationId赋给appId。看来在第一次请求时,YARN就给分配了appId,只是这个appId,并不和资源关联。

    第二调用YarnClient - 提交job

     如果资源足够,AM就可以提交,那就开始填写AM运行需要的资源,具体来说就是组装ApplicationSubmissionContext类的一个对象

        name match {
          case Some(name) => { appCtx.setApplicationName(name) }
          case None => { appCtx.setApplicationName(appId.toString) }
        }
    
        env match {
          case Some(env) => {
            containerCtx.setEnvironment(env)
            info("set environment variables to %s for %s" format (env, appId.get))
          }
          case None => None
        }
    
        // set the local package so that the containers and app master are provisioned with it
        val packageUrl = ConverterUtils.getYarnUrlFromPath(packagePath)
        val fileStatus = packagePath.getFileSystem(conf).getFileStatus(packagePath)
    
        packageResource.setResource(packageUrl)
        info("set package url to %s for %s" format (packageUrl, appId.get))
        packageResource.setSize(fileStatus.getLen)
        info("set package size to %s for %s" format (fileStatus.getLen, appId.get))
        packageResource.setTimestamp(fileStatus.getModificationTime)
        packageResource.setType(LocalResourceType.ARCHIVE)
        packageResource.setVisibility(LocalResourceVisibility.APPLICATION)
    
        resource.setMemory(mem)
        info("set memory request to %s for %s" format (mem, appId.get))
        resource.setVirtualCores(cpu)
        info("set cpu core request to %s for %s" format (cpu, appId.get))
        appCtx.setResource(resource)
        containerCtx.setCommands(cmds.toList)
        info("set command to %s for %s" format (cmds, appId.get))
        containerCtx.setLocalResources(Collections.singletonMap("__package", packageResource))
        appCtx.setApplicationId(appId.get)
        info("set app ID to %s" format appId.get)
        appCtx.setAMContainerSpec(containerCtx)
        appCtx.setApplicationType(ClientHelper.applicationType)
        info("submitting application request for %s" format appId.get)
        yarnClient.submitApplication(appCtx)
    

     这段代码设置了一个ApplicationSubmissionContext对象,然后再用yarnClient把它提交。这样就提交了一个YARN job。 

    这样YarnClient一共用了两次,初始一次请求,获取appID和YARN的资源上限的情况,第二次请求,真正提交job。

      这段代码让我有些疑惑。首先appCtx大致分为两部分,一部分是job的信息,比如application type和application ID,另一部分和AM有关。和AM有关的部分又可以分成两块: 1. cpu和内存的大小,这两个资源组装在Resource这个类的对象里,由setResource设置到 appCtx中 2:运行container所需的命令和文件、环量变量,这部分设置在一个ContainerLaunchContext对象中,然后这个对象再被调置在appCtx中。疑惑的地方在于:为什么AM所需的资源要分成两部分呢?cpu和内存本就该是container申请的一部分呀?

    看看API里关于containerLaunchContext类的说明,就更不明白了

    ContainerLaunchContext represents all of the information needed by the NodeManager to launch a container.

    It includes details such as:

    • ContainerId of the container.
    • Resource allocated to the container.
    • User to whom the container is allocated.
    • Security tokens (if security is enabled).
    • LocalResource necessary for running the container such as binaries, jar, shared-objects, side-files etc.
    • Optional, application-specific binary service data.
    • Environment variables for the launched process.
    • Command to launch the container.

    好吧,“Resource allocated to the container.”, 这一条ContainerLanchContext并没有体现,在它提供的方法中并不能设置Resource。这不是骗人吗?

    而appCtx却有单独的一个setAMContainerSpec 方法来设置Resource。那么在申请运行task所需的container时,如果说明其所需的资源呢?看来一定不是用了这个ContainerLaunchContext对象。

    两个不同的协议

    Samza AM为task申请container的代码在SamzaAppMasterTaskManager这个类里

      protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) {
        info("Requesting %d container(s) with %dmb of memory" format (containers, memMb))
        val capability = Records.newRecord(classOf[Resource])
        val priority = Records.newRecord(classOf[Priority])
        priority.setPriority(0)
        capability.setMemory(memMb)
        capability.setVirtualCores(cpuCores)
        (0 until containers).foreach(idx => amClient.addContainerRequest(new ContainerRequest(capability, null, null, priority)))
      }
    

      这里的amClient就是org.apache.hadoop.yarn.client.api.async.AMRMClientAsync类的对象。它用来和RM联系,处理container相关的事情。当AM请求container时,它就不用submitApplication中为AM设置container资源所需的那套动作了,而是使用ContainerRequest这类。而且ContainerRequest的构造方法中

    public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, boolean relaxLocality) 
    

    使用了Resource做为参数。

    可见为AM申请container和为task申请container走的过程的确不一样。毕竟,为AM的运行申请container是作为提交任务的一部分最终发现两个是使用的不同的协议。提交任务时,使用的是这个协议:

    message ApplicationSubmissionContextProto {
        optional ApplicationIdProto application_id = 1;
        optional string application_name = 2 [default = "N/A"];
        optional string queue = 3 [default = "default"];
        optional PriorityProto priority = 4;
        optional ContainerLaunchContextProto am_container_spec = 5;
        optional bool cancel_tokens_when_complete = 6 [default = true];
        optional bool unmanaged_am = 7 [default = false];
        optional int32 maxAppAttempts = 8 [default = 0];
        optional ResourceProto resource = 9;
        optional string applicationType = 10 [default = "YARN"];
    }
    
    message ContainerLaunchContextProto {
        repeated StringLocalResourceMapProto localResources = 1;
        optional bytes tokens = 2;
        repeated StringBytesMapProto service_data = 3;
        repeated StringStringMapProto environment = 4;
        repeated string command = 5;
        repeated ApplicationACLMapProto application_ACLs = 6;
    }
    

    ContainerLaunchContextProto里根本没有代表cpu和内存资源的ResourceProto,这个Protocol是在ApplicationSubmissionContextProto里。对照containerLaunchContext类的说明,的确显得很奇怪。

    而申请container的请求,走的是

    message ResourceRequestProto {
      optional PriorityProto priority = 1;
      optional string resource_name = 2;
      optional ResourceProto capability = 3;
      optional int32 num_containers = 4;
      optional bool relax_locality = 5 [default = true];
    }
    
    message ResourceProto {
      optional int32 memory = 1;
      optional int32 virtual_cores = 2;
    }
    
  • 相关阅读:
    数据库的接口
    BionicThe README from the bionic/libc
    发现问题——创新的原动力
    使用ADO或ADO控件访问数据库
    游标、事务并发和锁三者之间的那点事
    处女座——菜鸟程序员的工程总结
    数据库的基础知识以及创建数据库
    《Team Geek》前言(中文,自己翻译的)
    万里长征,始于足下——菜鸟程序员的学习总结(一)
    与RMAN相关的动态性能视图
  • 原文地址:https://www.cnblogs.com/devos/p/3720174.html
Copyright © 2011-2022 走看看