GroupingSets Unary Logical Operator
GroupingSets
is a unary logical operator that represents SQL’s GROUPING SETS variant of GROUP BY
clause.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
val q = sql(""" SELECT customer, year, SUM(sales) FROM VALUES ("abc", 2017, 30) AS t1 (customer, year, sales) GROUP BY customer, year GROUPING SETS ((customer), (year)) """) scala> println(q.queryExecution.logical.numberedTreeString) 00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)] 01 +- 'SubqueryAlias t1 02 +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)] |
GroupingSets
operator is resolved to an Aggregate
logical operator at analysis phase.
1 2 3 4 5 6 7 8 9 10 |
scala> println(q.queryExecution.analyzed.numberedTreeString) 00 Aggregate [customer#8, year#9, spark_grouping_id#5], [customer#8, year#9, sum(cast(sales#2 as bigint)) AS sum(sales)#4L] 01 +- Expand [List(customer#0, year#1, sales#2, customer#6, null, 1), List(customer#0, year#1, sales#2, null, year#7, 2)], [customer#0, year#1, sales#2, customer#8, year#9, spark_grouping_id#5] 02 +- Project [customer#0, year#1, sales#2, customer#0 AS customer#6, year#1 AS year#7] 03 +- SubqueryAlias t1 04 +- LocalRelation [customer#0, year#1, sales#2] |
Note
|
GroupingSets can only be created using SQL.
|
Note
|
GroupingSets is not supported on Structured Streaming’s streaming Datasets.
|
GroupingSets
is never resolved (as it can only be converted to an Aggregate
logical operator).
The output schema of a GroupingSets
are exactly the attributes of aggregate named expressions.
Analysis Phase
GroupingSets
operator is resolved at analysis phase in the following logical evaluation rules:
-
ResolveAliases for unresolved aliases in aggregate named expressions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
val spark: SparkSession = ... // using q from the example above val plan = q.queryExecution.logical scala> println(plan.numberedTreeString) 00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)] 01 +- 'SubqueryAlias t1 02 +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)] // Note unresolvedalias for SUM expression // Note UnresolvedInlineTable and SubqueryAlias // FIXME Show the evaluation rules to get rid of the unresolvable parts |
Creating GroupingSets Instance
GroupingSets
takes the following when created:
-
Expressions from
GROUPING SETS
clause -
Grouping expressions from
GROUP BY
clause -
Child logical plan
-
Aggregate named expressions