关注 spark技术分享,
撸spark源码 玩spark最佳实践

ContinuousExecution — StreamExecution of Continuous Stream Processing

ContinuousExecution — StreamExecution in Continuous Stream Processing

ContinuousExecution is the StreamExecution in Continuous Stream Processing.

ContinuousExecution is created when StreamingQueryManager is requested to create a streaming query with a StreamWriteSupport sink and a ContinuousTrigger (when DataStreamWriter is requested to start an execution of the streaming query).

ContinuousExecution can only run streaming queries with StreamingRelationV2 with ContinuousReadSupport data source.

When created for a streaming query ContinuousExecution is given the analyzed logical plan. The analyzed logical plan is immediately transformed to include a ContinuousExecutionRelation for every StreamingRelationV2 with ContinuousReadSupport data source (and is the logical plan internally).

Note
ContinuousExecution uses the same instance of ContinuousExecutionRelation for the same instances of StreamingRelationV2 with ContinuousReadSupport data source.

ContinuousExecution allows for exactly one ContinuousReader in the streaming query (and asserts it when addOffset and commit).

When requested to run the streaming query, ContinuousExecution collects ContinuousReadSupport data sources (inside ContinuousExecutionRelation) from the analyzed logical plan and requests each and every ContinuousReadSupport to create a ContinuousReader (that are stored in continuousSources internal registry).

Table 1. ContinuousExecution’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

continuousSources

Used when ContinuousExecution is requested to commit, getStartOffsets, and runContinuous

Use sources to access the current value

currentEpochCoordinatorId

FIXME

Used when…​FIXME

triggerExecutor

  • ProcessingTimeExecutor for ContinuousTrigger

Used when…​FIXME

Note
StreamExecution throws an IllegalStateException when the Trigger is not a ContinuousTrigger.

getStartOffsets Internal Method

getStartOffsets…​FIXME

Note
getStartOffsets is used when…​FIXME

Committing Epoch — commit Method

commit…​FIXME

Note
commit is used exclusively when EpochCoordinator is requested to commitEpoch.

awaitEpoch Internal Method

awaitEpoch…​FIXME

Note
awaitEpoch is used when…​FIXME

addOffset Method

addOffset…​FIXME

Note
addOffset is used when…​FIXME

sources Method

Note
sources is part of ProgressReporter Contract to…​FIXME.

sources…​FIXME

Analyzed Logical Plan of Streaming Query — logicalPlan Property

Note
logicalPlan is part of StreamExecution Contract that is the analyzed logical plan of the streaming query.

logicalPlan resolves StreamingRelationV2 leaf logical operators (with a ContinuousReadSupport source) to ContinuousExecutionRelation leaf logical operators.

Internally, logicalPlan transforms the analyzed logical plan as follows:

  1. For every StreamingRelationV2 leaf logical operator with a ContinuousReadSupport source, logicalPlan looks it up for the corresponding ContinuousExecutionRelation (if available in the internal lookup registry) or creates a ContinuousExecutionRelation (with the ContinuousReadSupport source, the options and the output attributes of the StreamingRelationV2 operator)

  2. For any other StreamingRelationV2, logicalPlan throws an UnsupportedOperationException:

Running Activated Streaming Query — runActivatedStream Method

Note
runActivatedStream is part of StreamExecution Contract to run a streaming query.

runActivatedStream…​FIXME

Running Streaming Query in Continuous Mode — runContinuous Internal Method

runContinuous…​FIXME

Note
runContinuous is used exclusively when ContinuousExecution is requested to run an activated streaming query.

Creating ContinuousExecution Instance

ContinuousExecution takes the following when created:

  • SparkSession

  • The name of the structured query

  • Path to the checkpoint directory (aka metadata directory)

  • Analyzed logical query plan (LogicalPlan)

  • StreamWriteSupport

  • Trigger

  • Clock

  • Output mode

  • Options (Map[String, String])

  • deleteCheckpointOnStop flag to control whether to delete the checkpoint directory on stop

ContinuousExecution initializes the internal registries and counters.

Stopping Streaming Query — stop Method

Note
stop is part of the StreamingQuery Contract to stop the streaming query.

stop transitions the streaming query to TERMINATED state.

If the queryExecutionThread is alive (i.e. it has been started and has not yet died), stop interrupts it and waits for this thread to die.

In the end, stop prints out the following INFO message to the logs:

Note
prettyIdString is in the format of queryName [id = [id], runId = [runId]].
赞(0) 打赏
未经允许不得转载:spark技术分享 » ContinuousExecution — StreamExecution of Continuous Stream Processing
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏