StateStoreOps — Extension Methods for Creating StateStoreRDD
StateStoreOps
is a Scala implicit class to create StateStoreRDD
when the following physical operators are executed:
Note
|
Implicit Classes are a language feature in Scala for implicit conversions with extension methods for existing types. |
Creating StateStoreRDD (with storeUpdateFunction Aborting StateStore When Task Fails) — mapPartitionsWithStateStore
Method
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
mapPartitionsWithStateStore[U]( sqlContext: SQLContext, stateInfo: StatefulOperatorStateInfo, keySchema: StructType, valueSchema: StructType, indexOrdinal: Option[Int])( storeUpdateFunction: (StateStore, Iterator[T]) => Iterator[U]): StateStoreRDD[T, U] (1) mapPartitionsWithStateStore[U]( stateInfo: StatefulOperatorStateInfo, keySchema: StructType, valueSchema: StructType, indexOrdinal: Option[Int], sessionState: SessionState, storeCoordinator: Option[StateStoreCoordinatorRef])( storeUpdateFunction: (StateStore, Iterator[T]) => Iterator[U]): StateStoreRDD[T, U] |
-
Uses
sqlContext.streams.stateStoreCoordinator
to accessStateStoreCoordinator
Internally, mapPartitionsWithStateStore
requests SparkContext
to clean storeUpdateFunction
function.
Note
|
mapPartitionsWithStateStore uses the enclosing RDD to access the current SparkContext .
|
Note
|
Function Cleaning is to clean a closure from unreferenced variables before it is serialized and sent to tasks. SparkContext reports a SparkException when the closure is not serializable.
|
mapPartitionsWithStateStore
then creates a (wrapper) function to abort the StateStore
if state updates had not been committed before a task finished (which is to make sure that the StateStore
has been committed or aborted in the end to follow the contract of StateStore
).
Note
|
mapPartitionsWithStateStore uses TaskCompletionListener to be notified when a task has finished.
|
In the end, mapPartitionsWithStateStore
creates a StateStoreRDD (with the wrapper function, SessionState
and StateStoreCoordinatorRef).
Note
|
|