Speculative Execution of Tasks
Speculative tasks (also speculatable tasks or task strugglers) are tasks that run slower than most (FIXME the setting) of the all tasks in a job.
Speculative execution of tasks is a health-check procedure that checks for tasks to be speculated, i.e. running slower in a stage than the median of all successfully completed tasks in a taskset (FIXME the setting). Such slow tasks will be re-submitted to another worker. It will not stop the slow tasks, but run a new copy in parallel.
The thread starts as TaskSchedulerImpl
starts in clustered deployment modes with spark.speculation enabled. It executes periodically every spark.speculation.interval after the initial spark.speculation.interval
passes.
When enabled, you should see the following INFO message in the logs:
1 2 3 4 5 |
INFO TaskSchedulerImpl: Starting speculative execution thread |
It works as task-scheduler-speculation
daemon thread pool using j.u.c.ScheduledThreadPoolExecutor
with core pool size 1
.
The job with speculatable tasks should finish while speculative tasks are running, and it will leave these tasks running – no KILL command yet.
It uses checkSpeculatableTasks
method that asks rootPool
to check for speculatable tasks. If there are any, SchedulerBackend
is called for reviveOffers.
Caution
|
FIXME How does Spark handle repeated results of speculative tasks since there are copies launched? |
Settings
Spark Property | Default Value | Description |
---|---|---|
|
Enables ( |
|
|
The time interval to use before checking for speculative tasks. |
|
|
||
|
The percentage of tasks that has not finished yet at which to start speculation. |