关注 spark技术分享,
撸spark源码 玩spark最佳实践

GetCurrentDatabase

admin阅读(1394)

GetCurrentDatabase Logical Optimization

GetCurrentDatabase is a base logical optimization that gives the current database for current_database SQL function.

GetCurrentDatabase is part of the Finish Analysis once-executed batch in the standard batches of the Catalyst Optimizer.

GetCurrentDatabase is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Note

GetCurrentDatabase corresponds to SQL’s current_database() function.

You can access the current database in Scala using

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

EliminateView

admin阅读(1532)

EliminateView Logical Optimization

EliminateView is part of the Finish Analysis once-executed batch in the standard batches of the Catalyst Optimizer.

EliminateView is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply simply removes (eliminates) View unary logical operators from the input logical plan and replaces them with their child logical operator.

apply throws an AssertionError when the output schema of the View operator does not match the output schema of the child logical operator.

Note
The assertion should not really happen since AliasViewChild logical analysis rule is executed earlier and takes care of not allowing for such difference in the output schema (by throwing an AnalysisException earlier).

EliminateSubqueryAliases

admin阅读(1676)

EliminateSubqueryAliases Logical Optimization

EliminateSubqueryAliases is part of the Finish Analysis once-executed batch in the standard batches of the Catalyst Optimizer.

EliminateSubqueryAliases is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply simply removes (eliminates) SubqueryAlias unary logical operators from the input logical plan.

EliminateSerialization

admin阅读(799)

EliminateSerialization Logical Optimization

EliminateSerialization is a base logical optimization that optimizes logical plans with DeserializeToObject (after SerializeFromObject or TypedFilter), AppendColumns (after SerializeFromObject), TypedFilter (after SerializeFromObject) logical operators.

EliminateSerialization is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Catalyst Optimizer.

EliminateSerialization is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Examples include:

Example — map followed by filter Logical Plan

Example — map followed by another map Logical Plan

Example — groupByKey followed by agg Logical Plan

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

DecimalAggregates

admin阅读(1572)

DecimalAggregates Logical Optimization

DecimalAggregates is a base logical optimization that transforms Sum and Average aggregate functions on fixed-precision DecimalType values to use UnscaledValue (unscaled Long) values in WindowExpression and AggregateExpression expressions.

DecimalAggregates is part of the Decimal Optimizations fixed-point batch in the standard batches of the Catalyst Optimizer.

DecimalAggregates is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Tip

Import DecimalAggregates and apply the rule directly on your structured queries to learn how the rule works.

Example: sum Aggregate Function on Decimal with Precision Smaller Than 9

Example: avg Aggregate Function on Decimal with Precision Smaller Than 12

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

CostBasedJoinReorder

admin阅读(1345)

CostBasedJoinReorder Logical Optimization — Join Reordering in Cost-Based Optimization

ReorderJoin is part of the Join Reorder once-executed batch in the standard batches of the Catalyst Optimizer.

ReorderJoin is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

CostBasedJoinReorder applies the join optimizations on a logical plan with 2 or more consecutive inner or cross joins (possibly separated by Project operators) when spark.sql.cbo.enabled and spark.sql.cbo.joinReorder.enabled configuration properties are both enabled.

CostBasedJoinReorder uses row count statistic that is computed using ANALYZE TABLE COMPUTE STATISTICS SQL command with no NOSCAN option.

Caution

FIXME Examples of other join queries

  • Cross join with join condition

  • Project with attributes only and Inner join with join condition

  • Project with attributes only and Cross join with join condition

Tip

Enable DEBUG logging level for org.apache.spark.sql.catalyst.optimizer.JoinReorderDP logger to see the join reordering duration.

Add the following line to conf/log4j.properties:

Refer to Logging.

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply traverses the input logical plan down and tries to reorder the following logical operators:

  • Join for CROSS or INNER joins with a join condition

  • Project with the above Join child operator and the project list of Attribute leaf expressions only

Reordering Logical Plan with Join Operators — reorder Internal Method

reorder…​FIXME

Note
reorder is used exclusively when CostBasedJoinReorder is applied to a logical plan.

replaceWithOrderedJoin Internal Method

replaceWithOrderedJoin…​FIXME

Note
replaceWithOrderedJoin is used recursively and when CostBasedJoinReorder is reordering…​FIXME

Extracting Consecutive Join Operators — extractInnerJoins Internal Method

extractInnerJoins finds consecutive Join logical operators (inner or cross) with join conditions or Project logical operators with Join logical operator and the project list of Attribute leaf expressions only.

For Project operators extractInnerJoins calls itself recursively with the Join operator inside.

In the end, extractInnerJoins gives the collection of logical plans under the consecutive Join logical operators (possibly separated by Project operators only) and their join conditions (for which And expressions have been split).

Note
extractInnerJoins is used recursively when CostBasedJoinReorder is reordering a logical plan.

ConstantFolding

admin阅读(974)

ConstantFolding Logical Optimization

ConstantFolding is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Catalyst Optimizer.

ConstantFolding is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

ComputeCurrentTime

admin阅读(1409)

ComputeCurrentTime Logical Optimization

ComputeCurrentTime is part of the Finish Analysis once-executed batch in the standard batches of the Catalyst Optimizer.

ComputeCurrentTime is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

CombineUnions

admin阅读(1545)

CombineUnions Logical Optimization

CombineUnions is a base logical optimization that FIXME.

CombineUnions is part of the Union once-executed batch in the standard batches of the Catalyst Optimizer.

CombineUnions is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

CombineTypedFilters

admin阅读(2255)

CombineTypedFilters Logical Optimization

CombineTypedFilters is a base logical optimization that combines two back to back (typed) filters into one that ultimately ends up as a single method call.

CombineTypedFilters is part of the Object Expressions Optimization fixed-point batch in the standard batches of the Catalyst Optimizer.

CombineTypedFilters is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

关注公众号:spark技术分享

联系我们联系我们