关注 spark技术分享,
撸spark源码 玩spark最佳实践

UserDefinedAggregateFunction — Contract for User-Defined Untyped Aggregate Functions (UDAFs)

UserDefinedAggregateFunction — Contract for User-Defined Untyped Aggregate Functions (UDAFs)

UserDefinedAggregateFunction is the contract to define user-defined aggregate functions (UDAFs).

UserDefinedAggregateFunction is created using apply or distinct factory methods.

The lifecycle of UserDefinedAggregateFunction is entirely managed using ScalaUDAF expression container.

spark sql UserDefinedAggregateFunction.png
Figure 1. UserDefinedAggregateFunction and ScalaUDAF Expression Container
Note

Use UDFRegistration to register a (temporary) UserDefinedAggregateFunction and use it in SQL mode.

UserDefinedAggregateFunction Contract

Table 1. (Subset of) UserDefinedAggregateFunction Contract
Method Description

bufferSchema

dataType

deterministic

evaluate

initialize

inputSchema

merge

update

Creating Column for UDAF — apply Method

apply creates a Column with ScalaUDAF (inside AggregateExpression).

Note
AggregateExpression uses Complete mode and isDistinct flag is disabled.

Creating Column for UDAF with Distinct Values — distinct Method

distinct creates a Column with ScalaUDAF (inside AggregateExpression).

Note
AggregateExpression uses Complete mode and isDistinct flag is enabled.
Note
distinct is like apply but has isDistinct flag enabled.

赞(0) 打赏
未经允许不得转载:spark技术分享 » UserDefinedAggregateFunction — Contract for User-Defined Untyped Aggregate Functions (UDAFs)
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏