InterruptibleIterator — Iterator With Support For Task Cancellation
InterruptibleIterator is a custom Scala Iterator that supports task cancellation, i.e. stops iteration when a task was interrupted (cancelled).
Quoting the official Scala Iterator documentation:
Iterators are data structures that allow to iterate over a sequence of elements. They have a
hasNextmethod for checking if there is a next element available, and anextmethod which returns the next element and discards it from the iterator.
InterruptibleIterator is created when:
-
RDDis requested to get or compute a RDD partition -
CoGroupedRDD, HadoopRDD, NewHadoopRDD, ParallelCollectionRDD are requested to
computea partition -
BlockStoreShuffleReaderis requested to read combined key-value records for a reduce task -
PairRDDFunctionsis requested to combineByKeyWithClassTag -
Spark SQL’s
DataSourceRDDandJDBCRDDare requested tocomputea partition -
Spark SQL’s
RangeExecphysical operator is requested todoExecute -
PySpark’s
BasePythonRunneris requested tocompute
InterruptibleIterator takes the following when created:
|
Note
|
InterruptibleIterator is a Developer API which is a lower-level, unstable API intended for Spark developers that may change or be removed in minor versions of Apache Spark.
|
hasNext Method
|
1 2 3 4 5 |
hasNext: Boolean |
|
Note
|
hasNext is part of Iterator Contract to test whether this iterator can provide another element.
|
hasNext requests the TaskContext to kill the task if interrupted (that simply throws a TaskKilledException that in turn breaks the task execution).
In the end, hasNext requests the delegate Iterator to hasNext.
next Method
|
1 2 3 4 5 |
next(): T |
|
Note
|
next is part of Iterator Contract to produce the next element of this iterator.
|
next simply requests the delegate Iterator to next.
spark技术分享