关注 spark技术分享,
撸spark源码 玩spark最佳实践

ResolveAliases

admin阅读(1537)

ResolveAliases Logical Resolution Rule

ResolveAliases is a logical resolution rule that the logical query plan analyzer uses to FIXME in an entire logical query plan.

Technically, ResolveAliases is just a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

ResolveAliases is part of Resolution fixed-point batch of rules.

Note
ResolveAliases is a Scala object inside Analyzer class.

Applying ResolveAliases to Logical Plan — apply Method

Note
apply is part of Rule Contract to apply a rule to a logical plan.

apply…​FIXME

assignAliases Internal Method

assignAliases…​FIXME

Note
assignAliases is used when…​FIXME

RelationConversions

admin阅读(1867)

RelationConversions Logical PostHoc Evaluation Rule — Converting Hive Tables

Note
A Hive table is when the provider is hive in table metadata.
Caution
FIXME Show example of a hive table, e.g. spark.table(…​)

RelationConversions is created exclusively when the Hive-specific logical query plan analyzer is created.

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply traverses the input logical plan looking for a InsertIntoTable with HiveTableRelation logical operators or HiveTableRelation logical operator alone.

For a InsertIntoTable with non-partitioned HiveTableRelation relation (that can be converted) apply converts the HiveTableRelation to a LogicalRelation.

For a HiveTableRelation logical operator alone apply…​FIXME

Creating RelationConversions Instance

RelationConversions takes the following when created:

Does Table Use Parquet or ORC SerDe? — isConvertible Internal Method

isConvertible is positive when the input HiveTableRelation is a parquet or ORC table (and corresponding SQL properties are enabled).

Internally, isConvertible takes the Hive SerDe of the table (from table metadata) if available or assumes no SerDe.

isConvertible is turned on when either condition holds:

Note
isConvertible is used when RelationConversions is executed.

Converting HiveTableRelation to LogicalRelation — convert Internal Method

convert takes SerDe of (the storage of) the input HiveTableRelation and converts HiveTableRelation to LogicalRelation, i.e.

  1. For parquet serde, convert adds mergeSchema option being the value of spark.sql.hive.convertMetastoreParquet.mergeSchema configuration property (disabled by default) and requests HiveMetastoreCatalog to convertToLogicalRelation (with ParquetFileFormat as fileFormatClass).

For non-parquet serde, convert assumes ORC format.

  • When spark.sql.orc.impl configuration property is native (default) convert requests HiveMetastoreCatalog to convertToLogicalRelation (with org.apache.spark.sql.execution.datasources.orc.OrcFileFormat as fileFormatClass).

  • Otherwise, convert requests HiveMetastoreCatalog to convertToLogicalRelation (with org.apache.spark.sql.hive.orc.OrcFileFormat as fileFormatClass).

Note
convert uses HiveSessionCatalog to access the HiveMetastoreCatalog.
Note

convert is used when RelationConversions logical evaluation rule does the following transformations:

  • Transforms a InsertIntoTable with HiveTableRelation with a Hive table (i.e. with hive provider) that is not partitioned and uses parquet or orc data storage format

  • Transforms a HiveTableRelation with a Hive table (i.e. with hive provider) that uses parquet or orc data storage format

PreWriteCheck

admin阅读(1655)

PreWriteCheck Extended Analysis Check

PreWriteCheck is an extended analysis check that verifies correctness of a logical query plan with regard to InsertIntoTable unary logical operator (right before analysis can be considered complete).

PreWriteCheck is part of the extended analysis check rules of the logical Analyzer in BaseSessionStateBuilder and HiveSessionStateBuilder.

PreWriteCheck is simply a function of LogicalPlan that…​FIXME

Executing Function — apply Method

Note
apply is part of Scala’s scala.Function1 contract to create a function of one parameter.

apply traverses the input logical query plan and finds InsertIntoTable unary logical operators.

PreprocessTableCreation

admin阅读(1438)

PreprocessTableCreation PostHoc Logical Resolution Rule

PreprocessTableCreation is a posthoc logical resolution rule that resolves a logical query plan with CreateTable logical operators.

PreprocessTableCreation is part of the Post-Hoc Resolution once-executed batch of the Hive-specific and the default logical analyzers.

PreprocessTableCreation is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

PreprocessTableCreation takes a SparkSession when created.

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

LookupFunctions

admin阅读(1768)

LookupFunctions Logical Rule — Checking Whether UnresolvedFunctions Are Resolvable

LookupFunctions is a logical rule that the logical query plan analyzer uses to make sure that UnresolvedFunction expressions can be resolved in an entire logical query plan.

LookupFunctions is similar to ResolveFunctions logical resolution rule, but it is ResolveFunctions to resolve UnresolvedFunction expressions while LookupFunctions is just a sanity check that a future resolution is possible if tried.

Technically, LookupFunctions is just a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Note
LookupFunctions does not however transform a logical plan.

LookupFunctions is part of Simple Sanity Check one-off batch of rules.

Note
LookupFunctions is a Scala object inside Analyzer class.

Applying LookupFunctions to Logical Plan — apply Method

Note
apply is part of Rule Contract to apply a rule to a logical plan.

apply finds all UnresolvedFunction expressions (in every logical operator in the input logical plan) and requests the SessionCatalog to check if their functions exist.

apply does nothing if a function exists or reports a NoSuchFunctionException (that fails logical analysis).

HiveAnalysis

admin阅读(1585)

HiveAnalysis PostHoc Logical Resolution Rule

Technically, HiveAnalysis is a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Applying HiveAnalysis Rule to Logical Plan (Executing HiveAnalysis) — apply Method

Note
apply is part of Rule Contract to apply a rule to a logical plan.

apply…​FIXME

FindDataSourceTable

admin阅读(1646)

FindDataSourceTable Logical Evaluation Rule for Resolving UnresolvedCatalogRelations

FindDataSourceTable is a Catalyst rule that the default and Hive-specific logical query plan analyzers use for resolving UnresolvedCatalogRelations in a logical plan for the following cases:

  • InsertIntoTables with UnresolvedCatalogRelation (for datasource and hive tables)

  • “Standalone” UnresolvedCatalogRelations

Note
UnresolvedCatalogRelation leaf logical operator is a placeholder that ResolveRelations logical evaluation rule adds to a logical plan while resolving UnresolvedRelations leaf logical operators.

FindDataSourceTable is part of additional rules in Resolution fixed-point batch of rules.

Applying FindDataSourceTable Rule to Logical Plan (and Resolving UnresolvedCatalogRelations in Logical Plan) — apply Method

Note
apply is part of Rule Contract to apply a rule to a logical plan.

apply…​FIXME

readHiveTable Internal Method

readHiveTable simply creates a HiveTableRelation for the input CatalogTable.

Note
readHiveTable is used when FindDataSourceTable is requested to resolving UnresolvedCatalogRelations in a logical plan.

readDataSourceTable Internal Method

readDataSourceTable…​FIXME

Note
readDataSourceTable is used exclusively when FindDataSourceTable logical evaluation rule is executed (for data source tables).

ExtractWindowExpressions

admin阅读(1367)

ExtractWindowExpressions Logical Resolution Rule

ExtractWindowExpressions is a logical resolution rule that transforms a logical query plan and replaces (extracts) WindowExpression expressions with Window logical operators.

ExtractWindowExpressions is part of the Resolution fixed-point batch in the standard batches of the Analyzer.

ExtractWindowExpressions is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Note
ExtractWindowExpressions is a Scala object inside Analyzer class (so you have to create an instance of the Analyzer class to access it or simply use SessionState).

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

hasWindowFunction Internal Method

  1. Executes the other hasWindowFunction on every NamedExpression in the projectList

hasWindowFunction is positive (true) when the input expr named expression is a WindowExpression expression. Otherwise, hasWindowFunction is negative (false).

Note
hasWindowFunction is used when ExtractWindowExpressions logical resolution rule is requested to extract and execute.

extract Internal Method

extract…​FIXME

Note
extract is used exclusively when ExtractWindowExpressions logical resolution rule is executed.

Adding Project and Window Logical Operators to Logical Plan — addWindow Internal Method

addWindow adds a Project logical operator with one or more Window logical operators (for every WindowExpression in the input named expressions) to the input logical plan.

Internally, addWindow…​FIXME

Note
addWindow is used exclusively when ExtractWindowExpressions logical resolution rule is executed.

关注公众号:spark技术分享

联系我们联系我们