关注 spark技术分享,
撸spark源码 玩spark最佳实践

IncrementalExecution — QueryExecution of Streaming Datasets

IncrementalExecution — QueryExecution of Streaming Datasets

IncrementalExecution is a QueryExecution of a streaming Dataset that StreamExecution creates when incrementally executing the logical query plan (every trigger).

IncrementalExecution StreamExecution.png
Figure 1. StreamExecution creates IncrementalExecution (every trigger / streaming batch)
Tip
Details on QueryExecution contract can be found in the Mastering Apache Spark 2 gitbook.

IncrementalExecution registers state physical preparation rule with the parent QueryExecution‘s preparations that prepares the streaming physical plan (using batch-specific execution properties).

IncrementalExecution is created when:

IncrementalExecution uses the state checkpoint directory (that is given when IncrementalExecution is created) that is one of the following:

Table 1. IncrementalExecution’s Internal Registries and Counters (in alphabetical order)
Name Description

planner

SparkPlanner with the following extra planning strategies (in the order of execution):

Note

planner is used to plan (aka convert) an optimized logical plan into a physical plan (that is later available as sparkPlan).

sparkPlan physical plan is then prepared for execution using preparations physical optimization rules. The result is later available as executedPlan physical plan.

state

State preparation rule (i.e. Rule[SparkPlan]) that transforms a streaming physical plan (i.e. SparkPlan with StateStoreSaveExec, StreamingDeduplicateExec and FlatMapGroupsWithStateExec physical operators) and fills missing properties that are batch-specific, e.g.

Used when IncrementalExecution prepares a physical plan (i.e. SparkPlan) for execution (which is when StreamExecution runs a streaming batch and plans a streaming query).

statefulOperatorId

Java’s AtomicInteger

  • 0 when IncrementalExecution is created

  • Incremented…​FIXME

nextStatefulOperationStateInfo Internal Method

nextStatefulOperationStateInfo creates a new StatefulOperatorStateInfo with checkpointLocation, runId, the next statefulOperatorId and currentBatchId.

Note
All the properties of StatefulOperatorStateInfo are specified when IncrementalExecution is created.
Note
nextStatefulOperationStateInfo is used exclusively when IncrementalExecution is requested to transform a streaming physical plan using state preparation rule.

Creating IncrementalExecution Instance

IncrementalExecution takes the following when created:

IncrementalExecution initializes the internal registries and counters.

shouldRunAnotherBatch Method

shouldRunAnotherBatch…​FIXME

Note
shouldRunAnotherBatch is used exclusively when MicroBatchExecution is requested to <spark-sql-streaming-MicroBatchExecution.adoc#constructNextBatch, construct the next streaming batch>>.
赞(0) 打赏
未经允许不得转载:spark技术分享 » IncrementalExecution — QueryExecution of Streaming Datasets
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏