spark.scheduler.maxRegisteredResourcesWaitingTime
在执行前最大等待申请资源的时间,默认30s。
spark.scheduler.minRegisteredResourcesRatio
实际注册的资源数占预期需要的资源数的比例,默认0.8
spark.scheduler.mode
调度模式,默认FIFO 先进队列先调度,可以选择FAIR。
spark.scheduler.revive.interval
work回复重启的时间间隔,默认1s
spark.scheduler.listenerbus.eventqueue.capacity
spark事件监听队列容量,默认10000,必须为正值,增加可能会消耗更多内存
spark.blacklist.enabled
是否列入黑名单,默认false。如果设成true,当一个executor失败好几次时,会被列入黑名单,防止后续task派发到这个executor。可以进一步调节spark.blacklist以下相关的参数:
(均为测试参数 Experimental)
spark.blacklist.timeout
spark.blacklist.task.maxTaskAttemptsPerExecutor
spark.blacklist.task.maxTaskAttemptsPerNode
spark.blacklist.stage.maxFailedTasksPerExecutor
spark.blacklist.application.maxFailedExecutorsPerNode
spark.blacklist.killBlacklistedExecutors
spark.blacklist.application.fetchFailure.enabled
spark.speculation
推测,如果有task执行的慢了,就会重新执行它。默认false,
详细相关配置如下:
spark.speculation.interval
检查task快慢的频率,推测间隔,默认100ms。
spark.speculation.multiplier
推测比均值慢几次算是task执行过慢,默认1.5
spark.speculation.quantile
在某个stage,完成度必须达到该参数的比例,才能被推测,默认0.75
spark.task.cpus
每个task分配的cpu数,默认1
spark.task.maxFailures
在放弃这个job前允许的最大失败次数,重试次数为该参数-1,默认4
spark.task.reaper.enabled
赋予spark监控有权限去kill那些失效的task,默认false
(原先有 job失败了但一直显示有task在running,总算找到这个参数了)
其他进阶的配置如下:
spark.task.reaper.pollingInterval
轮询被kill掉的task的时间间隔,如果还在running,就会打warn日志,默认10s。
spark.task.reaper.threadDump
线程回收是是否产生日志,默认true。
spark.task.reaper.killTimeout
当一个被kill的task过了多久还在running,就会把那个executor给kill掉,默认-1。
spark.stage.maxConsecutiveAttempts
在终止前,一个stage连续尝试次数,默认4。
cat fairscheduler.xml
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<allocations>
<pool name="production">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>2</minShare>
</pool>
<pool name="test">
<schedulingMode>FIFO</schedulingMode>
<weight>2</weight>
<minShare>3</minShare>
</pool>
</allocations>
参数解释:
pool name:调度池的名称
schedulingMode:调度模式,有两种FIFO、FAIR
weight:配置某个线程池的资源权重,默认为1,这里配置5,代表default池会获得5倍的资源
minShare:给每个调度池指定一个最小的shares(cpu的核数),公平调度器通过权重重新分配资源之前总是试图满足所有活动调度池的最小share,默认为0
spark-defaults.conf
spark.scheduler.mode FAIR
spark.scheduler.allocation.file /usr/local/spark-2.4.3-bin-hadoop2.7/conf/fairscheduler.xml