AggregateFunction Contract — Aggregate Function Expressions
AggregateFunction is the contract for Catalyst expressions that represent aggregate functions.
AggregateFunction is used wrapped inside a AggregateExpression (using toAggregateExpression method) when:
-
Analyzerresolves functions (for SQL mode) -
…FIXME: Anywhere else?
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import org.apache.spark.sql.functions.collect_list scala> val fn = collect_list("gid") fn: org.apache.spark.sql.Column = collect_list(gid) import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression scala> val aggFn = fn.expr.asInstanceOf[AggregateExpression].aggregateFunction aggFn: org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction = collect_list('gid, 0, 0) scala> println(aggFn.numberedTreeString) 00 collect_list('gid, 0, 0) 01 +- 'gid |
|
Note
|
Aggregate functions are not foldable, i.e. FIXME |
| Name | Behaviour | Examples |
|---|---|---|
AggregateFunction Contract
|
1 2 3 4 5 6 7 8 9 10 |
abstract class AggregateFunction extends Expression { def aggBufferSchema: StructType def aggBufferAttributes: Seq[AttributeReference] def inputAggBufferAttributes: Seq[AttributeReference] def defaultResult: Option[Literal] = None } |
| Method | Description |
|---|---|
|
Schema of an aggregation buffer to hold partial aggregate results. Used mostly in ScalaUDAF and AggregationIterator |
|
|
AttributeReferences of an aggregation buffer to hold partial aggregate results. Used in:
|
|
|
Defaults to |
Creating AggregateExpression for AggregateFunction — toAggregateExpression Method
|
1 2 3 4 5 6 |
toAggregateExpression(): AggregateExpression (1) toAggregateExpression(isDistinct: Boolean): AggregateExpression |
-
Calls the other
toAggregateExpressionwithisDistinctdisabled (i.e.false)
toAggregateExpression creates a AggregateExpression for the current AggregateFunction with Complete aggregate mode.
|
Note
|
|
spark技术分享