InterruptibleIterator — Iterator With Support For Task Cancellation
InterruptibleIterator
is a custom Scala Iterator that supports task cancellation, i.e. stops iteration when a task was interrupted (cancelled).
Quoting the official Scala Iterator documentation:
Iterators are data structures that allow to iterate over a sequence of elements. They have a
hasNext
method for checking if there is a next element available, and anext
method which returns the next element and discards it from the iterator.
InterruptibleIterator
is created when:
-
RDD
is requested to get or compute a RDD partition -
CoGroupedRDD, HadoopRDD, NewHadoopRDD, ParallelCollectionRDD are requested to
compute
a partition -
BlockStoreShuffleReader
is requested to read combined key-value records for a reduce task -
PairRDDFunctions
is requested to combineByKeyWithClassTag -
Spark SQL’s
DataSourceRDD
andJDBCRDD
are requested tocompute
a partition -
Spark SQL’s
RangeExec
physical operator is requested todoExecute
-
PySpark’s
BasePythonRunner
is requested tocompute
InterruptibleIterator
takes the following when created:
Note
|
InterruptibleIterator is a Developer API which is a lower-level, unstable API intended for Spark developers that may change or be removed in minor versions of Apache Spark.
|
hasNext
Method
1 2 3 4 5 |
hasNext: Boolean |
Note
|
hasNext is part of Iterator Contract to test whether this iterator can provide another element.
|
hasNext
requests the TaskContext to kill the task if interrupted (that simply throws a TaskKilledException
that in turn breaks the task execution).
In the end, hasNext
requests the delegate Iterator to hasNext
.
next
Method
1 2 3 4 5 |
next(): T |
Note
|
next is part of Iterator Contract to produce the next element of this iterator.
|
next
simply requests the delegate Iterator to next
.