关注 spark技术分享,
撸spark源码 玩spark最佳实践

IncrementalExecution — QueryExecution of Streaming Datasets

IncrementalExecution — QueryExecution of Streaming Datasets

IncrementalExecution is a QueryExecution of a streaming Dataset that StreamExecution creates when incrementally executing the logical query plan (every trigger).

IncrementalExecution StreamExecution.png
Figure 1. StreamExecution creates IncrementalExecution (every trigger / streaming batch)
Details on QueryExecution contract can be found in the Mastering Apache Spark 2 gitbook.

IncrementalExecution registers state physical preparation rule with the parent QueryExecution‘s preparations that prepares the streaming physical plan (using batch-specific execution properties).

IncrementalExecution is created when:

IncrementalExecution uses the state checkpoint directory (that is given when IncrementalExecution is created) that is one of the following:

Table 1. IncrementalExecution’s Internal Registries and Counters (in alphabetical order)
Name Description


SparkPlanner with the following extra planning strategies (in the order of execution):


planner is used to plan (aka convert) an optimized logical plan into a physical plan (that is later available as sparkPlan).

sparkPlan physical plan is then prepared for execution using preparations physical optimization rules. The result is later available as executedPlan physical plan.


State preparation rule (i.e. Rule[SparkPlan]) that transforms a streaming physical plan (i.e. SparkPlan with StateStoreSaveExec, StreamingDeduplicateExec and FlatMapGroupsWithStateExec physical operators) and fills missing properties that are batch-specific, e.g.

Used when IncrementalExecution prepares a physical plan (i.e. SparkPlan) for execution (which is when StreamExecution runs a streaming batch and plans a streaming query).


Java’s AtomicInteger

  • 0 when IncrementalExecution is created

  • Incremented…​FIXME

nextStatefulOperationStateInfo Internal Method

nextStatefulOperationStateInfo creates a new StatefulOperatorStateInfo with checkpointLocation, runId, the next statefulOperatorId and currentBatchId.

All the properties of StatefulOperatorStateInfo are specified when IncrementalExecution is created.
nextStatefulOperationStateInfo is used exclusively when IncrementalExecution is requested to transform a streaming physical plan using state preparation rule.

Creating IncrementalExecution Instance

IncrementalExecution takes the following when created:

IncrementalExecution initializes the internal registries and counters.

shouldRunAnotherBatch Method


shouldRunAnotherBatch is used exclusively when MicroBatchExecution is requested to <spark-sql-streaming-MicroBatchExecution.adoc#constructNextBatch, construct the next streaming batch>>.
赞(0) 打赏
未经允许不得转载:spark技术分享 » IncrementalExecution — QueryExecution of Streaming Datasets
分享到: 更多 (0)




