FindDataSourceTable Logical Evaluation Rule for Resolving UnresolvedCatalogRelations
FindDataSourceTable
is a Catalyst rule that the default and Hive-specific logical query plan analyzers use for resolving UnresolvedCatalogRelations in a logical plan for the following cases:
-
InsertIntoTables with
UnresolvedCatalogRelation
(for datasource and hive tables) -
“Standalone”
UnresolvedCatalogRelations
Note
|
UnresolvedCatalogRelation leaf logical operator is a placeholder that ResolveRelations logical evaluation rule adds to a logical plan while resolving UnresolvedRelations leaf logical operators.
|
FindDataSourceTable
is part of additional rules in Resolution
fixed-point batch of rules.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
scala> :type spark org.apache.spark.sql.SparkSession // Example: InsertIntoTable with UnresolvedCatalogRelation // Drop tables to make the example reproducible val db = spark.catalog.currentDatabase Seq("t1", "t2").foreach { t => spark.sharedState.externalCatalog.dropTable(db, t, ignoreIfNotExists = true, purge = true) } // Create tables sql("CREATE TABLE t1 (id LONG) USING parquet") sql("CREATE TABLE t2 (id LONG) USING orc") import org.apache.spark.sql.catalyst.dsl.plans._ val plan = table("t1").insertInto(tableName = "t2", overwrite = true) scala> println(plan.numberedTreeString) 00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false 01 +- 'UnresolvedRelation `t1` // Transform the logical plan with ResolveRelations logical rule first // so UnresolvedRelations become UnresolvedCatalogRelations import spark.sessionState.analyzer.ResolveRelations val planWithUnresolvedCatalogRelations = ResolveRelations(plan) scala> println(planWithUnresolvedCatalogRelations.numberedTreeString) 00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false 01 +- 'SubqueryAlias t1 02 +- 'UnresolvedCatalogRelation `default`.`t1`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe // Let's resolve UnresolvedCatalogRelations then import org.apache.spark.sql.execution.datasources.FindDataSourceTable val r = new FindDataSourceTable(spark) val tablesResolvedPlan = r(planWithUnresolvedCatalogRelations) // FIXME Why is t2 not resolved?! scala> println(tablesResolvedPlan.numberedTreeString) 00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false 01 +- SubqueryAlias t1 02 +- Relation[id#10L] parquet |
Applying FindDataSourceTable Rule to Logical Plan (and Resolving UnresolvedCatalogRelations in Logical Plan) — apply
Method
1 2 3 4 5 |
apply(plan: LogicalPlan): LogicalPlan |
Note
|
apply is part of Rule Contract to apply a rule to a logical plan.
|
apply
…FIXME
readHiveTable
Internal Method
1 2 3 4 5 |
readHiveTable(table: CatalogTable): LogicalPlan |
readHiveTable
simply creates a HiveTableRelation
for the input CatalogTable.
Note
|
readHiveTable is used when FindDataSourceTable is requested to resolving UnresolvedCatalogRelations in a logical plan.
|
readDataSourceTable
Internal Method
1 2 3 4 5 |
readDataSourceTable(table: CatalogTable): LogicalPlan |
readDataSourceTable
…FIXME
Note
|
readDataSourceTable is used exclusively when FindDataSourceTable logical evaluation rule is executed (for data source tables).
|