ScalaUDAF — Catalyst Expression Adapter for UserDefinedAggregateFunction
ScalaUDAF is a Catalyst expression adapter to manage the lifecycle of UserDefinedAggregateFunction and hook it in Spark SQL’s Catalyst execution path.
ScalaUDAF is created when:
-
UserDefinedAggregateFunctioncreates aColumnfor a user-defined aggregate function using all and distinct values (to use the UDAF in Dataset operators) -
UDFRegistrationis requested to register a user-defined aggregate function (to use the UDAF in SQL mode)
ScalaUDAF is a ImperativeAggregate.
| Method Name | Behaviour |
|---|---|
|
Requests UserDefinedAggregateFunction to initialize |
|
|
Requests UserDefinedAggregateFunction to merge |
|
|
Requests UserDefinedAggregateFunction to update |
When evaluated, ScalaUDAF…FIXME
ScalaUDAF has no representation in SQL.
| Name | Description |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copy of aggBufferAttributes |
|
|
|
|
|
Always enabled (i.e. |
| Name | Description |
|---|---|
|
Used when…FIXME |
|
|
Used when…FIXME |
|
|
Used when…FIXME |
|
|
Used when…FIXME |
Creating ScalaUDAF Instance
ScalaUDAF takes the following when created:
-
Children Catalyst expressions
ScalaUDAF initializes the internal registries and counters.
initialize Method
|
1 2 3 4 5 |
initialize(buffer: InternalRow): Unit |
initialize sets the input buffer internal binary row as underlyingBuffer of MutableAggregationBufferImpl and requests the UserDefinedAggregateFunction to initialize (with the MutableAggregationBufferImpl).
|
Note
|
initialize is part of ImperativeAggregate Contract.
|
update Method
|
1 2 3 4 5 |
update(mutableAggBuffer: InternalRow, inputRow: InternalRow): Unit |
update sets the input buffer internal binary row as underlyingBuffer of MutableAggregationBufferImpl and requests the UserDefinedAggregateFunction to update.
|
Note
|
update uses inputProjection on the input input and converts it using inputToScalaConverters.
|
|
Note
|
update is part of ImperativeAggregate Contract.
|
merge Method
|
1 2 3 4 5 |
merge(buffer1: InternalRow, buffer2: InternalRow): Unit |
merge first sets:
-
underlyingBufferof MutableAggregationBufferImpl to the inputbuffer1 -
underlyingInputBufferof InputAggregationBuffer to the inputbuffer2
merge then requests the UserDefinedAggregateFunction to merge (passing in the MutableAggregationBufferImpl and InputAggregationBuffer).
|
Note
|
merge is part of ImperativeAggregate Contract.
|
spark技术分享