SparkContext源码分析
在任何Spark程序中,必须要创建一个SparkContext,在SparkContext中,最主要的就是创建了TaskScheduler和DAGScheduler,以及SparkUI
...
// Create and start the scheduler
val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode) // 创建taskScheduler
_schedulerBackend = sched
_taskScheduler = ts
_dagScheduler = new DAGScheduler(this) // 创建DAGScheduler
...
// 在创建SparkContext的时候,会执行val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
private def createTaskScheduler(
sc: SparkContext,
master: String,
deployMode: String): (SchedulerBackend, TaskScheduler) = {
import SparkMasterRegex._
// When running locally, don't try to re-execute tasks on failure.
val MAX_LOCAL_TASK_FAILURES = 1
master match { // 匹配master,我们这里主要以Standlone为主,所以,就只看SPARK_REGEX
...
case SPARK_REGEX(sparkUrl) => // Standlone模式
val scheduler = new TaskSchedulerImpl(sc) // 创建TaskScheduler
val masterUrls = sparkUrl.split(",").map("spark://" + _)
// 创建StandaloneSchedulerBackend
val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend) // 初始化taskScheduler, 主要是赋值backend以及根据调度方法创建调度池
(backend, scheduler)
...
}
}
图解如下(其中的SparkDeploySchedulerBackend是1.0的名字,2.0就是StandaloneSchedulerBackend):
![](https://images2015.cnblogs.com/blog/1053613/201707/1053613-20170705094231737-1768510133.png)