关注 spark技术分享,
撸spark源码 玩spark最佳实践

FairSchedulableBuilder

FairSchedulableBuilder – SchedulableBuilder for FAIR Scheduling Mode

FairSchedulableBuilder is a SchedulableBuilder with the pools configured in an optional allocations configuration file.

It reads the allocations file using the internal buildFairSchedulerPool method.

Tip

Enable INFO logging level for org.apache.spark.scheduler.FairSchedulableBuilder logger to see what happens inside.

Add the following line to conf/log4j.properties:

Refer to Logging.

buildPools

buildPools builds the rootPool based on the allocations configuration file from the optional spark.scheduler.allocation.file or fairscheduler.xml (on the classpath).

Note
buildPools is part of the SchedulableBuilder Contract.
Tip
Spark comes with fairscheduler.xml.template to use as a template for the allocations configuration file to start from.

addTaskSetManager

Note
addTaskSetManager is part of the SchedulableBuilder Contract.
Note
Although the Pool.getSchedulableByName method may return no Schedulable for a name, the default root pool does exist as it is assumed it was registered before.

If properties for the Schedulable were given, spark.scheduler.pool property is looked up and becomes the current pool name (or defaults to default).

Note
spark.scheduler.pool is the only property supported. Refer to spark.scheduler.pool later in this document.

If the pool name is not available, it is registered with the pool name, FIFO scheduling mode, minimum share 0, and weight 1.

After the new pool was registered, you should see the following INFO message in the logs:

The manager schedulable is registered to the pool (either the one that already existed or was created just now).

You should see the following INFO message in the logs:

spark.scheduler.pool Property

SparkContext.setLocalProperty allows for setting properties per thread to group jobs in logical groups. This mechanism is used by FairSchedulableBuilder to watch for spark.scheduler.pool property to group jobs from threads and submit them to a non-default pool.

Tip
See addTaskSetManager for how this setting is used.

fairscheduler.xml Allocations Configuration File

The allocations configuration file is an XML file.

The default conf/fairscheduler.xml.template looks as follows:

Tip
The top-level element’s name allocations can be anything. Spark does not insist on allocations and accepts any name.

Ensure Default Pool is Registered (buildDefaultPool method)

buildDefaultPool method checks whether default was defined already and if not it adds the default pool with FIFO scheduling mode, minimum share 0, and weight 1.

You should see the following INFO message in the logs:

Build Pools from XML Allocations File (buildFairSchedulerPool method)

buildFairSchedulerPool reads Pools from the allocations configuration file (as is).

For each pool element, it reads its name (from name attribute) and assumes the default pool configuration to be FIFO scheduling mode, minimum share 0, and weight 1 (unless overrode later).

Caution
FIXME Why is the difference between minShare 0 and weight 1 vs rootPool in TaskSchedulerImpl.initialize – 0 and 0? It is definitely an inconsistency.

If schedulingMode element exists and is not empty for the pool it becomes the current pool’s scheduling mode. It is case sensitive, i.e. with all uppercase letters.

If minShare element exists and is not empty for the pool it becomes the current pool’s minShare. It must be an integer number.

If weight element exists and is not empty for the pool it becomes the current pool’s weight. It must be an integer number.

The pool is then registered to rootPool.

If all is successful, you should see the following INFO message in the logs:

Settings

spark.scheduler.allocation.file

spark.scheduler.allocation.file is the file path of an optional scheduler configuration file that FairSchedulableBuilder.buildPools uses to build pools.

赞(0) 打赏
未经允许不得转载:spark技术分享 » FairSchedulableBuilder
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏