StatefulAggregationStrategy Execution Planning Strategy for EventTimeWatermark and Aggregate Logical Operators
StatefulAggregationStrategy
is an execution planning strategy (i.e. Strategy
) that IncrementalExecution uses to plan EventTimeWatermark
and Aggregate
logical operators in streaming structured queries (Datasets
).
Note
|
EventTimeWatermark logical operator is the result of withWatermark operator. |
Note
|
Aggregate logical operator represents groupBy and groupByKey aggregations (and SQL’s GROUP BY clause).
|
StatefulAggregationStrategy
is available using SessionState
.
Logical Operator | Physical Operator | ||
---|---|---|---|
In the order of preference:
|
Selecting Aggregate Physical Operator Given Aggregate Expressions — AggUtils.planStreamingAggregation
Internal Method
planStreamingAggregation
takes the grouping attributes (from groupingExpressions
).
Note
|
groupingExpressions corresponds to the grouping function in groupBy operator.
|
planStreamingAggregation
creates an aggregate physical operator (called partialAggregate
) with:
-
requiredChildDistributionExpressions
undefined (i.e.None
) -
initialInputBufferOffset
as0
-
functionsWithoutDistinct
inPartial
mode -
child
operator as the inputchild
Note
|
|
planStreamingAggregation
creates an aggregate physical operator (called partialMerged1
) with:
-
requiredChildDistributionExpressions
based on the inputgroupingExpressions
-
initialInputBufferOffset
as the length ofgroupingExpressions
-
functionsWithoutDistinct
inPartialMerge
mode -
child
operator as partialAggregate aggregate physical operator created above
planStreamingAggregation
creates StateStoreRestoreExec with the grouping attributes, undefined StatefulOperatorStateInfo
, and partialMerged1 aggregate physical operator created above.
planStreamingAggregation
creates an aggregate physical operator (called partialMerged2
) with:
-
child
operator as StateStoreRestoreExec physical operator created above
Note
|
The only difference between partialMerged1 and partialMerged2 steps is the child physical operator. |
planStreamingAggregation
creates StateStoreSaveExec with:
-
the grouping attributes based on the input
groupingExpressions
-
No
stateInfo
,outputMode
andeventTimeWatermark
-
child
operator as partialMerged2 aggregate physical operator created above
In the end, planStreamingAggregation
creates the final aggregate physical operator (called finalAndCompleteAggregate
) with:
-
requiredChildDistributionExpressions
based on the inputgroupingExpressions
-
initialInputBufferOffset
as the length ofgroupingExpressions
-
functionsWithoutDistinct
inFinal
mode -
child
operator as StateStoreSaveExec physical operator created above
Note
|
planStreamingAggregation is used exclusively when StatefulAggregationStrategy plans a streaming aggregation.
|