zoukankan      html  css  js  c++  java
  • Hadoop Yarn CapacityScheduler设置

    开启调度器conf/yarn-site.xml
    <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>

    配置队列
    调度器的核心就是队列的分配和使用了,修改conf/capacity-scheduler.xml可以配置队列。
    Capacity调度器默认有一个预定义的队列——root,所有的队列都是它的子队列。
    队列的分配支持层次化的配置,使用.来进行分割,比如yarn.scheduler.capacity.<queue-path>.queues

    下面是配置的样例,比如root下面有三个子队列:
    <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>a,b,c</value>
    <description>The queues at the this level (root is the root queue).
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.root.a.queues</name>
    <value>a1,a2</value>
    <description>The queues at the this level (root is the root queue).
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.root.b.queues</name>
    <value>b1,b2,b3</value>
    <description>The queues at the this level (root is the root queue).
    </description>
    </property>
    队列属性
    yarn.scheduler.capacity.<queue-path>.capacity
    它是队列的资源容量占比(百分比)。
    系统繁忙时,每个队列都应该得到设置的量的资源;
    当系统空闲时,该队列的资源则可以被其他的队列使用。同一层的所有队列加起来必须是100%。

    yarn.scheduler.capacity.<queue-path>.maximum-capacity

    队列资源的使用上限。由于系统空闲时,队列可以使用其他的空闲资源,
    因此最多使用的资源量则是该参数控制。默认是-1,即禁用。

    yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent

    每个任务占用的最少资源。比如,你设置成了25%。那么如果有两个用户提交任务,那么每个任务资源不超过50%。
    如果3个用户提交任务,那么每个任务资源不超过33%。如果4个用户提交任务,那么每个任务资源不超过25%。
    如果5个用户提交任务,那么第五个用户需要等待才能提交。默认是100,即不去做限制。


    yarn.scheduler.capacity.<queue-path>.user-limit-factor

    每个用户最多使用的队列资源占比,如果设置为50.那么每个用户使用的资源最多就是50%。


    yarn.scheduler.capacity.maximum-applications / yarn.scheduler.capacity.<queue-path>.maximum-applications

    设置系统中可以同时运行和等待的应用数量。默认是10000.

    yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent

    设置有多少资源可以用来运行app master,即控制当前激活状态的应用。默认是10%。

    队列管理
    yarn.scheduler.capacity.<queue-path>.state
    队列的状态,可以使RUNNING或者STOPPED.
    如果队列是STOPPED状态,那么新应用不会提交到该队列或者子队列。
    同样,
    如果root被设置成STOPPED,那么整个集群都不能提交任务了。
    现有的应用可以等待完成,因此队列可以优雅的退出关闭。

    yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications

    访问控制列表ACL控制谁可以向该队列提交任务。
    如果一个用户可以向该队列提交,那么也可以提交任务到它的子队列。


    yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue

    设置队列的管理员的ACL控制,管理员可以控制队列的所有应用程序。同样,它也具有继承性。

    注意:ACL的设置是user1,user2 group1,group2这种格式。如果是则代表任何人。空格表示任何人都不允许。默认是.

    yarn.scheduler.capacity.resource-calculator

    资源计算方法,默认是org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator,它只会计算内存。
    DominantResourceCalculator则会计算内存和CPU。

    yarn.scheduler.capacity.node-locality-delay

    调度器尝试进行调度的次数。一般都是跟集群的节点数量有关。默认40(一个机架上的节点数)
    一旦设置完这些队列属性,就可以在web ui上看到了。可以访问下面的连接:
    xxx:8088/scheduler


    修改队列配置

    如果想要修改队列或者调度器的配置,可以修改

    vi $HADOOP_CONF_DIR/capacity-scheduler.xml

    修改完成后,需要执行下面的命令:

    $HADOOP_YARN_HOME/bin/yarn rmadmin -refreshQueues

    注意:

    队列不能被删除,只能新增。
    更新队列的配置需要是有效的值
    同层级的队列容量限制相加需要等于100%。
    如果希望自己的任务调度到queue1队列,只需在启动任务时指定:

    mapreduce.job.queuename参数为queue1即可,默认为default队列

    ====================================================

    vi 

    yarn-site.xml

    <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>

    cat capacity-scheduler.xml
    <!--
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>

    <property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>
    Maximum number of applications that can be pending and running.
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.1</value>
    <description>
    Maximum percent of resources in the cluster which can be used to run
    application masters i.e. controls number of concurrent running
    applications.
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
    The ResourceCalculator implementation to be used to compare
    Resources in the scheduler.
    The default i.e. DefaultResourceCalculator only uses Memory while
    DominantResourceCalculator uses dominant-resource to compare
    multi-dimensional resources such as Memory, CPU etc.
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>comm,super</value>
    <description>
    The queues at the this level (root is the root queue).
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.root.comm.capacity</name>
    <value>50</value>
    <description>Default queue target capacity.</description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.root.super.capacity</name>
    <value>50</value>
    <description>Default queue target capacity.</description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.root.comm.user-limit-factor</name>
    <value>50</value>
    <description>
    Default queue user limit a percentage from 0.0 to 1.0.
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.root.super.user-limit-factor</name>
    <value>50</value>
    <description>
    Default queue user limit a percentage from 0.0 to 1.0.
    </description>
    </property>


    <property>
    <name>yarn.scheduler.capacity.root.comm.maximum-capacity</name>
    <value>50</value>
    <description>
    The maximum capacity of the default queue.
    </description>
    </property>


    <property>
    <name>yarn.scheduler.capacity.root.super.maximum-capacity</name>
    <value>50</value>
    <description>
    The maximum capacity of the default queue.
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.root.comm.state</name>
    <value>RUNNING</value>
    <description>
    The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
    </property>


    <property>
    <name>yarn.scheduler.capacity.root.super.state</name>
    <value>RUNNING</value>
    <description>
    The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
    <description>
    The ACL of who can submit jobs to the default queue.
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>
    The ACL of who can administer jobs on the default queue.
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
    Number of missed scheduling opportunities after which the CapacityScheduler
    attempts to schedule rack-local containers.
    Typically this should be set to number of nodes in the cluster, By default is setting
    approximately number of nodes in one rack which is 40.
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value></value>
    <description>
    A list of mappings that will be used to assign jobs to queues
    The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
    Typically this list will be used to map users to queues,
    for example, u:%user:%user maps all users to queues with the same name
    as the user.
    </description>
    </property>

    <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
    If a queue mapping is present, will it override the value specified
    by the user? This can be used by administrators to place jobs in queues
    that are different than the one specified by the user.
    The default is false.
    </description>
    </property>

    </configuration>

    https://niyanchun.com/yarn-scheduler.html

    https://tech.meituan.com/2019/08/01/hadoop-yarn-scheduling-performance-optimization-practice.html

  • 相关阅读:
    二十三、Android源代码是这样搞到的(图解)
    defer用途
    vscode中go插件配置
    peewee外键性能问题
    bootstrap-select属性
    go环境变量及build文件
    peewee在flask中的配置
    python元类
    Java静态方法、单例模式区别
    Java实现list清除重复的字符串
  • 原文地址:https://www.cnblogs.com/songyuejie/p/13519180.html
Copyright © 2011-2022 走看看