ResolveFunctions Logical Resolution Rule — Resolving grouping__id UnresolvedAttribute, UnresolvedGenerator And UnresolvedFunction Expressions
ResolveFunctions
is a logical resolution rule that the logical query plan analyzer uses to resolve grouping__id UnresolvedAttribute, UnresolvedGenerator and UnresolvedFunction expressions in an entire logical query plan.
Technically, ResolveReferences
is just a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan]
.
ResolveFunctions
is part of Resolution fixed-point batch of rules.
Note
|
ResolveFunctions is a Scala object inside Analyzer class.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
import spark.sessionState.analyzer.ResolveFunctions // Example: UnresolvedAttribute with VirtualColumn.hiveGroupingIdName (grouping__id) => Alias import org.apache.spark.sql.catalyst.expressions.VirtualColumn import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute val groupingIdAttr = UnresolvedAttribute(VirtualColumn.hiveGroupingIdName) scala> println(groupingIdAttr.sql) `grouping__id` // Using Catalyst DSL to create a logical plan with grouping__id import org.apache.spark.sql.catalyst.dsl.plans._ val t1 = table("t1") val plan = t1.select(groupingIdAttr) scala> println(plan.numberedTreeString) 00 'Project ['grouping__id] 01 +- 'UnresolvedRelation `t1` val resolvedPlan = ResolveFunctions(plan) scala> println(resolvedPlan.numberedTreeString) 00 'Project [grouping_id() AS grouping__id#0] 01 +- 'UnresolvedRelation `t1` import org.apache.spark.sql.catalyst.expressions.Alias val alias = resolvedPlan.expressions.head.asInstanceOf[Alias] scala> println(alias.sql) grouping_id() AS `grouping__id` // Example: UnresolvedGenerator => a) Generator or b) analysis failure // Register a function so a function resolution works import org.apache.spark.sql.catalyst.FunctionIdentifier import org.apache.spark.sql.catalyst.catalog.CatalogFunction val f1 = CatalogFunction(FunctionIdentifier(funcName = "f1"), "java.lang.String", resources = Nil) import org.apache.spark.sql.catalyst.expressions.{Expression, Stack} // FIXME What happens when looking up a function with the functionBuilder None in registerFunction? // Using Stack as ResolveFunctions requires that the function to be resolved is a Generator // You could roll your own, but that's a demo, isn't it? (don't get too carried away) spark.sessionState.catalog.registerFunction( funcDefinition = f1, overrideIfExists = true, functionBuilder = Some((children: Seq[Expression]) => Stack(children = Nil))) import org.apache.spark.sql.catalyst.analysis.UnresolvedGenerator import org.apache.spark.sql.catalyst.FunctionIdentifier val ungen = UnresolvedGenerator(name = FunctionIdentifier("f1"), children = Seq.empty) val plan = t1.select(ungen) scala> println(plan.numberedTreeString) 00 'Project [unresolvedalias('f1(), None)] 01 +- 'UnresolvedRelation `t1` val resolvedPlan = ResolveFunctions(plan) scala> println(resolvedPlan.numberedTreeString) 00 'Project [unresolvedalias(stack(), None)] 01 +- 'UnresolvedRelation `t1` CAUTION: FIXME // Example: UnresolvedFunction => a) AggregateWindowFunction with(out) isDistinct, b) AggregateFunction, c) other with(out) isDistinct val plan = ??? val resolvedPlan = ResolveFunctions(plan) |
Resolving grouping__id UnresolvedAttribute, UnresolvedGenerator and UnresolvedFunction Expressions In Entire Query Plan (Applying ResolveFunctions to Logical Plan) — apply
Method
1 2 3 4 5 |
apply(plan: LogicalPlan): LogicalPlan |
Note
|
apply is part of Rule Contract to apply a rule to a logical plan.
|
apply
takes a logical plan and transforms each expression (for every logical operator found in the query plan) as follows:
-
For UnresolvedAttributes with names as
groupingid
,apply
creates a Alias (with aGroupingID
child expression andgroupingid
name).That case seems mostly for compatibility with Hive as
grouping__id
attribute name is used by Hive. -
For UnresolvedGenerators,
apply
requests the SessionCatalog to find a Generator function by name.If some other non-generator function is found for the name,
apply
fails the analysis phase by reporting anAnalysisException
:12345[name] is expected to be a generator. However, its class is [className], which is not a generator. -
For UnresolvedFunctions,
apply
requests the SessionCatalog to find a function by name. -
AggregateWindowFunctions are returned directly or
apply
fails the analysis phase by reporting anAnalysisException
when theUnresolvedFunction
has isDistinct flag enabled.12345[name] does not support the modifier DISTINCT -
AggregateFunctions are wrapped in a AggregateExpression (with
Complete
aggregate mode) -
All other functions are returned directly or
apply
fails the analysis phase by reporting anAnalysisException
when theUnresolvedFunction
has isDistinct flag enabled.12345[name] does not support the modifier DISTINCT
apply
skips unresolved expressions.