关注 spark技术分享,
撸spark源码 玩spark最佳实践

Aggregator — Contract for User-Defined Typed Aggregate Functions (UDAFs)

Aggregator — Contract for User-Defined Typed Aggregate Functions (UDAFs)

Aggregator is the contract for user-defined typed aggregate functions (aka user-defined typed aggregations or UDAFs in short).

After you create a custom Aggregator, you should use toColumn method to convert it to a TypedColumn that can be used with Dataset.select and KeyValueGroupedDataset.agg typed operators.

Note

Use org.apache.spark.sql.expressions.scalalang.typed object to access the type-safe aggregate functions, i.e. avg, count, sum and sumLong.

Note

Aggregator is an Experimental and Evolving contract that is evolving towards becoming a stable API, but is not a stable API yet and can change from one feature release to another release.

In other words, using the contract is as treading on thin ice.

Aggregator is used when:

Table 1. Aggregator Contract
Method Description

bufferEncoder

Used when…​FIXME

finish

Used when…​FIXME

merge

Used when…​FIXME

outputEncoder

Used when…​FIXME

reduce

Used when…​FIXME

zero

Used when…​FIXME

Table 2. Aggregators
Aggregator Description

ParameterizedTypeSum

ReduceAggregator

TopByKeyAggregator

Used exclusively in Spark MLlib

TypedAverage

TypedCount

TypedSumDouble

TypedSumLong

Converting Aggregator to TypedColumn — toColumn Method

toColumn…​FIXME

Note
toColumn is used when…​FIXME
赞(0) 打赏
未经允许不得转载:spark技术分享 » Aggregator — Contract for User-Defined Typed Aggregate Functions (UDAFs)
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏