关注 spark技术分享,
撸spark源码 玩spark最佳实践

RelationConversions

RelationConversions Logical PostHoc Evaluation Rule — Converting Hive Tables

Note
A Hive table is when the provider is hive in table metadata.
Caution
FIXME Show example of a hive table, e.g. spark.table(…​)

RelationConversions is created exclusively when the Hive-specific logical query plan analyzer is created.

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply traverses the input logical plan looking for a InsertIntoTable with HiveTableRelation logical operators or HiveTableRelation logical operator alone.

For a InsertIntoTable with non-partitioned HiveTableRelation relation (that can be converted) apply converts the HiveTableRelation to a LogicalRelation.

For a HiveTableRelation logical operator alone apply…​FIXME

Creating RelationConversions Instance

RelationConversions takes the following when created:

Does Table Use Parquet or ORC SerDe? — isConvertible Internal Method

isConvertible is positive when the input HiveTableRelation is a parquet or ORC table (and corresponding SQL properties are enabled).

Internally, isConvertible takes the Hive SerDe of the table (from table metadata) if available or assumes no SerDe.

isConvertible is turned on when either condition holds:

Note
isConvertible is used when RelationConversions is executed.

Converting HiveTableRelation to LogicalRelation — convert Internal Method

convert takes SerDe of (the storage of) the input HiveTableRelation and converts HiveTableRelation to LogicalRelation, i.e.

  1. For parquet serde, convert adds mergeSchema option being the value of spark.sql.hive.convertMetastoreParquet.mergeSchema configuration property (disabled by default) and requests HiveMetastoreCatalog to convertToLogicalRelation (with ParquetFileFormat as fileFormatClass).

For non-parquet serde, convert assumes ORC format.

  • When spark.sql.orc.impl configuration property is native (default) convert requests HiveMetastoreCatalog to convertToLogicalRelation (with org.apache.spark.sql.execution.datasources.orc.OrcFileFormat as fileFormatClass).

  • Otherwise, convert requests HiveMetastoreCatalog to convertToLogicalRelation (with org.apache.spark.sql.hive.orc.OrcFileFormat as fileFormatClass).

Note
convert uses HiveSessionCatalog to access the HiveMetastoreCatalog.
Note

convert is used when RelationConversions logical evaluation rule does the following transformations:

  • Transforms a InsertIntoTable with HiveTableRelation with a Hive table (i.e. with hive provider) that is not partitioned and uses parquet or orc data storage format

  • Transforms a HiveTableRelation with a Hive table (i.e. with hive provider) that uses parquet or orc data storage format

赞(0) 打赏
未经允许不得转载:spark技术分享 » RelationConversions
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏