ResolveRelations Logical Resolution Rule — Resolving UnresolvedRelations With Tables in Catalog
ResolveRelations is a logical resolution rule that the logical query plan analyzer uses to resolve UnresolvedRelations (in a logical query plan), i.e.
-
Resolves UnresolvedRelation logical operators (in InsertIntoTable operators)
-
Other uses of
UnresolvedRelation
Technically, ResolveRelations is just a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].
ResolveRelations is part of Resolution fixed-point batch of rules.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
// Example: InsertIntoTable with UnresolvedRelation import org.apache.spark.sql.catalyst.dsl.plans._ val plan = table("t1").insertInto(tableName = "t2", overwrite = true) scala> println(plan.numberedTreeString) 00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false 01 +- 'UnresolvedRelation `t1` // Register the tables so the following resolution works sql("CREATE TABLE IF NOT EXISTS t1(id long)") sql("CREATE TABLE IF NOT EXISTS t2(id long)") // ResolveRelations is a Scala object of the Analyzer class // We need an instance of the Analyzer class to access it import spark.sessionState.analyzer.ResolveRelations val resolvedPlan = ResolveRelations(plan) scala> println(resolvedPlan.numberedTreeString) 00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false 01 +- 'SubqueryAlias t1 02 +- 'UnresolvedCatalogRelation `default`.`t1`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe // Example: Other uses of UnresolvedRelation // Use a temporary view val v1 = spark.range(1).createOrReplaceTempView("v1") scala> spark.catalog.listTables.filter($"name" === "v1").show +----+--------+-----------+---------+-----------+ |name|database|description|tableType|isTemporary| +----+--------+-----------+---------+-----------+ | v1| null| null|TEMPORARY| true| +----+--------+-----------+---------+-----------+ import org.apache.spark.sql.catalyst.dsl.expressions._ val plan = table("v1").select(star()) scala> println(plan.numberedTreeString) 00 'Project [*] 01 +- 'UnresolvedRelation `v1` val resolvedPlan = ResolveRelations(plan) scala> println(resolvedPlan.numberedTreeString) 00 'Project [*] 01 +- SubqueryAlias v1 02 +- Range (0, 1, step=1, splits=Some(8)) // Example import org.apache.spark.sql.catalyst.dsl.plans._ val plan = table(db = "db1", ref = "t1") scala> println(plan.numberedTreeString) 00 'UnresolvedRelation `db1`.`t1` // Register the database so the following resolution works sql("CREATE DATABASE IF NOT EXISTS db1") val resolvedPlan = ResolveRelations(plan) scala> println(resolvedPlan.numberedTreeString) 00 'SubqueryAlias t1 01 +- 'UnresolvedCatalogRelation `db1`.`t1`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe |
Applying ResolveRelations to Logical Plan — apply Method
|
1 2 3 4 5 |
apply(plan: LogicalPlan): LogicalPlan |
|
Note
|
apply is part of Rule Contract to apply a rule to a logical plan.
|
apply…FIXME
Resolving Relation — resolveRelation Method
|
1 2 3 4 5 |
resolveRelation(plan: LogicalPlan): LogicalPlan |
resolveRelation…FIXME
|
Note
|
resolveRelation is used when…FIXME
|
isRunningDirectlyOnFiles Internal Method
|
1 2 3 4 5 |
isRunningDirectlyOnFiles(table: TableIdentifier): Boolean |
isRunningDirectlyOnFiles is enabled (i.e. true) when all of the following conditions hold:
-
The database of the input
tableis defined -
spark.sql.runSQLOnFiles internal configuration property is enabled
-
The
tableis not a temporary table -
The database or the table do not exist (in the SessionCatalog)
|
Note
|
isRunningDirectlyOnFiles is used exclusively when ResolveRelations resolves a relation (as a UnresolvedRelation leaf logical operator for a table reference).
|
Finding Table in Session-Scoped Catalog of Relational Entities — lookupTableFromCatalog Internal Method
|
1 2 3 4 5 6 7 |
lookupTableFromCatalog( u: UnresolvedRelation, defaultDatabase: Option[String] = None): LogicalPlan |
lookupTableFromCatalog simply requests SessionCatalog to find the table in relational catalogs.
|
Note
|
lookupTableFromCatalog requests Analyzer for the current SessionCatalog.
|
|
Note
|
The table is described using TableIdentifier of the input UnresolvedRelation.
|
lookupTableFromCatalog fails the analysis phase (by reporting a AnalysisException) when the table or the table’s database cannot be found.
|
Note
|
lookupTableFromCatalog is used when ResolveRelations is executed (for InsertIntoTable with UnresolvedRelation operators) or resolves a relation (for “standalone” UnresolvedRelations).
|
spark技术分享