关注 spark技术分享,
撸spark源码 玩spark最佳实践

SchedulerBackend — Pluggable Scheduler Backends

SchedulerBackend — Pluggable Scheduler Backends

SchedulerBackend is a pluggable interface to support various cluster managers, e.g. Apache Mesos, Hadoop YARN or Spark’s own Spark Standalone and Spark local.

As the cluster managers differ by their custom task scheduling modes and resource offers mechanisms Spark abstracts the differences in SchedulerBackend contract.

Table 1. Built-In (Direct and Indirect) SchedulerBackends per Cluster Environment
Cluster Environment SchedulerBackends

Local mode

LocalSchedulerBackend

(base for custom SchedulerBackends)

CoarseGrainedSchedulerBackend

Spark Standalone

StandaloneSchedulerBackend

Spark on YARN

Spark on Mesos

A scheduler backend is created and started as part of SparkContext’s initialization (when TaskSchedulerImpl is started – see Creating Scheduler Backend and Task Scheduler).

Caution
FIXME Image how it gets created with SparkContext in play here or in SparkContext doc.

Scheduler backends are started and stopped as part of TaskSchedulerImpl’s initialization and stopping.

Being a scheduler backend in Spark assumes a Apache Mesos-like model in which “an application” gets resource offers as machines become available and can launch tasks on them. Once a scheduler backend obtains the resource allocation, it can start executors.

Tip
Understanding how Apache Mesos works can greatly improve understanding Spark.

SchedulerBackend Contract

Note
org.apache.spark.scheduler.SchedulerBackend is a private[spark] Scala trait in Spark.
Table 2. SchedulerBackend Contract
Method Description

applicationId

Unique identifier of Spark Application

Used when TaskSchedulerImpl is asked for the unique identifier of a Spark application (that is actually a part of TaskScheduler contract).

applicationAttemptId

Attempt id of a Spark application

Only supported by YARN cluster scheduler backend as the YARN cluster manager supports multiple application attempts.

Used when…​

NOTE: applicationAttemptId is also a part of TaskScheduler contract and TaskSchedulerImpl directly calls the SchedulerBackend’s applicationAttemptId.

defaultParallelism

Used when TaskSchedulerImpl finds the default level of parallelism (as a hint for sizing jobs).

getDriverLogUrls

Returns no URLs by default and only supported by YarnClusterSchedulerBackend

isReady

Controls whether SchedulerBackend is ready (i.e. true) or not (i.e. false). Enabled by default.

Used when TaskSchedulerImpl waits until SchedulerBackend is ready (which happens just before SparkContext is fully initialized).

killTask

Reports a UnsupportedOperationException by default.

Used when:

reviveOffers

start

Starts SchedulerBackend.

Used when TaskSchedulerImpl is started.

stop

Stops SchedulerBackend.

Used when TaskSchedulerImpl is stopped.

赞(0) 打赏
未经允许不得转载:spark技术分享 » SchedulerBackend — Pluggable Scheduler Backends
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏