关注 spark技术分享,
撸spark源码 玩spark最佳实践

AggregateProcessor

AggregateProcessor

AggregateProcessor is created and used exclusively when WindowExec physical operator is executed.

Table 1. AggregateProcessor’s Properties
Name Description

buffer

SpecificInternalRow with data types given bufferSchema

Note
AggregateProcessor is created using AggregateProcessor factory object (using apply method).

initialize Method

Caution
FIXME
Note

initialize is used when:

  • SlidingWindowFunctionFrame writes out to the target row

  • UnboundedWindowFunctionFrame is prepared

  • UnboundedPrecedingWindowFunctionFrame is prepared

  • UnboundedFollowingWindowFunctionFrame writes out to the target row

evaluate Method

Caution
FIXME
Note
evaluate is used when…​FIXME

apply Factory Method

Note
apply is used exclusively when WindowExec is executed (and creates WindowFunctionFrame per AGGREGATE window aggregate functions, i.e. AggregateExpression or AggregateWindowFunction)

Executing update on ImperativeAggregates — update Method

update executes the update method on every input ImperativeAggregate sequentially (one by one).

Internally, update joins buffer with input internal binary row and converts the joined InternalRow using the MutableProjection function.

update then requests every ImperativeAggregate to update passing in the buffer and the input input rows.

Note
MutableProjection mutates the same underlying binary row object each time it is executed.
Note
update is used when WindowFunctionFrame prepares or writes.

Creating AggregateProcessor Instance

AggregateProcessor takes the following when created:

  • Schema of the buffer (as a collection of AttributeReferences)

  • Initial MutableProjection

  • Update MutableProjection

  • Evaluate MutableProjection

  • ImperativeAggregate expressions for aggregate functions

  • Flag whether to track partition size

赞(0) 打赏
未经允许不得转载:spark技术分享 » AggregateProcessor
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏