HiveTableRelation Leaf Logical Operator — Representing Hive Tables in Logical Plan
HiveTableRelation is a leaf logical operator that represents a Hive table in a logical query plan.
HiveTableRelation is created exclusively when FindDataSourceTable logical evaluation rule is requested to resolve UnresolvedCatalogRelations in a logical plan (for Hive tables).
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
val tableName = "h1" // Make the example reproducible val db = spark.catalog.currentDatabase import spark.sharedState.{externalCatalog => extCatalog} extCatalog.dropTable( db, table = tableName, ignoreIfNotExists = true, purge = true) // sql("CREATE TABLE h1 (id LONG) USING hive") import org.apache.spark.sql.types.StructType spark.catalog.createTable( tableName, source = "hive", schema = new StructType().add($"id".long), options = Map.empty[String, String]) val h1meta = extCatalog.getTable(db, tableName) scala> println(h1meta.provider.get) hive // Looks like we've got the testing space ready for the experiment val h1 = spark.table(tableName) import org.apache.spark.sql.catalyst.dsl.plans._ val plan = table(tableName).insertInto("t2", overwrite = true) scala> println(plan.numberedTreeString) 00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false 01 +- 'UnresolvedRelation `h1` // ResolveRelations logical rule first to resolve UnresolvedRelations import spark.sessionState.analyzer.ResolveRelations val rrPlan = ResolveRelations(plan) scala> println(rrPlan.numberedTreeString) 00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false 01 +- 'SubqueryAlias h1 02 +- 'UnresolvedCatalogRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe // FindDataSourceTable logical rule next to resolve UnresolvedCatalogRelations import org.apache.spark.sql.execution.datasources.FindDataSourceTable val findTablesRule = new FindDataSourceTable(spark) val planWithTables = findTablesRule(rrPlan) // At long last... // Note HiveTableRelation in the logical plan scala> println(planWithTables.numberedTreeString) 00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false 01 +- SubqueryAlias h1 02 +- HiveTableRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#13L] |
HiveTableRelation is partitioned when it has at least one partition.
The metadata of a HiveTableRelation (in a catalog) has to meet the requirements:
-
The database is defined
-
The partition schema is of the same type as partitionCols
-
The data schema is of the same type as dataCols
|
Note
|
|
Computing Statistics — computeStats Method
|
1 2 3 4 5 |
computeStats(): Statistics |
|
Note
|
computeStats is part of LeafNode Contract to compute statistics for cost-based optimizer.
|
computeStats takes the table statistics from the table metadata if defined and converts them to Spark statistics (with output columns).
If the table statistics are not available, computeStats reports an IllegalStateException.
|
1 2 3 4 5 |
table stats must be specified. |
spark技术分享