关注 spark技术分享,
撸spark源码 玩spark最佳实践

HiveTableRelation

HiveTableRelation Leaf Logical Operator — Representing Hive Tables in Logical Plan

HiveTableRelation is a leaf logical operator that represents a Hive table in a logical query plan.

HiveTableRelation is created exclusively when FindDataSourceTable logical evaluation rule is requested to resolve UnresolvedCatalogRelations in a logical plan (for Hive tables).

HiveTableRelation is partitioned when it has at least one partition.

The metadata of a HiveTableRelation (in a catalog) has to meet the requirements:

HiveTableRelation has the output attributes made up of data followed by partition columns.

Note

HiveTableRelation is removed from a logical plan when HiveAnalysis logical rule is executed (and transforms a InsertIntoTable with HiveTableRelation to an InsertIntoHiveTable).

HiveTableRelation is when RelationConversions rule is executed (and converts HiveTableRelations to LogicalRelations).

HiveTableRelation is resolved to HiveTableScanExec physical operator when HiveTableScans strategy is executed.

Computing Statistics — computeStats Method

Note
computeStats is part of LeafNode Contract to compute statistics for cost-based optimizer.

computeStats takes the table statistics from the table metadata if defined and converts them to Spark statistics (with output columns).

If the table statistics are not available, computeStats reports an IllegalStateException.

Creating HiveTableRelation Instance

HiveTableRelation takes the following when created:

  • Table metadata

  • Columns (as a collection of AttributeReferences)

  • Partitions (as a collection of AttributeReferences)

赞(0) 打赏
未经允许不得转载:spark技术分享 » HiveTableRelation
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏