关注 spark技术分享,
撸spark源码 玩spark最佳实践

Speculative Execution of Tasks

Speculative Execution of Tasks

Speculative tasks (also speculatable tasks or task strugglers) are tasks that run slower than most (FIXME the setting) of the all tasks in a job.

Speculative execution of tasks is a health-check procedure that checks for tasks to be speculated, i.e. running slower in a stage than the median of all successfully completed tasks in a taskset (FIXME the setting). Such slow tasks will be re-submitted to another worker. It will not stop the slow tasks, but run a new copy in parallel.

The thread starts as TaskSchedulerImpl starts in clustered deployment modes with spark.speculation enabled. It executes periodically every spark.speculation.interval after the initial spark.speculation.interval passes.

When enabled, you should see the following INFO message in the logs:

It works as task-scheduler-speculation daemon thread pool using j.u.c.ScheduledThreadPoolExecutor with core pool size 1.

The job with speculatable tasks should finish while speculative tasks are running, and it will leave these tasks running – no KILL command yet.

It uses checkSpeculatableTasks method that asks rootPool to check for speculatable tasks. If there are any, SchedulerBackend is called for reviveOffers.

Caution
FIXME How does Spark handle repeated results of speculative tasks since there are copies launched?

Settings

Table 1. Spark Properties
Spark Property Default Value Description

spark.speculation

false

Enables (true) or disables (false) speculative execution of tasks (by means of task-scheduler-speculation Scheduled Executor Service).

spark.speculation.interval

100ms

The time interval to use before checking for speculative tasks.

spark.speculation.multiplier

1.5

spark.speculation.quantile

0.75

The percentage of tasks that has not finished yet at which to start speculation.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Speculative Execution of Tasks
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏