关注 spark技术分享,
撸spark源码 玩spark最佳实践

RewriteCorrelatedScalarSubquery

admin阅读(1527)

RewriteCorrelatedScalarSubquery Logical Optimization

RewriteCorrelatedScalarSubquery is a base logical optimization that transforms logical plans with the following operators:

  1. FIXME

RewriteCorrelatedScalarSubquery is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Catalyst Optimizer.

RewriteCorrelatedScalarSubquery is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

evalExpr Internal Method

evalExpr…​FIXME

Note
evalExpr is used exclusively when RewriteCorrelatedScalarSubquery is…​FIXME

evalAggOnZeroTups Internal Method

evalAggOnZeroTups…​FIXME

Note
evalAggOnZeroTups is used exclusively when RewriteCorrelatedScalarSubquery is…​FIXME

evalSubqueryOnZeroTups Internal Method

evalSubqueryOnZeroTups…​FIXME

Note
evalSubqueryOnZeroTups is used exclusively when RewriteCorrelatedScalarSubquery is requsted to constructLeftJoins.

constructLeftJoins Internal Method

constructLeftJoins…​FIXME

Note
constructLeftJoins is used exclusively when RewriteCorrelatedScalarSubquery logical optimization is executed (i.e. applied to Aggregate, Project or Filter logical operators with correlated scalar subqueries)

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply transforms the input logical plan as follows:

  1. For Aggregate operators, apply…​FIXME

  2. For Project operators, apply…​FIXME

  3. For Filter operators, apply…​FIXME

Extracting ScalarSubquery Expressions with Children — extractCorrelatedScalarSubqueries Internal Method

extractCorrelatedScalarSubqueries finds all ScalarSubquery expressions with at least one child in the input expression and adds them to the input subqueries collection.

extractCorrelatedScalarSubqueries traverses the input expression down (the expression tree) and, every time a ScalarSubquery with at least one child is found, returns the head of the output attributes of the subquery plan.

In the end, extractCorrelatedScalarSubqueries returns the rewritten expression.

Note
extractCorrelatedScalarSubqueries uses scala.collection.mutable.ArrayBuffer and mutates an instance inside (i.e. adds ScalarSubquery expressions) that makes for two output values, i.e. the rewritten expression and the ScalarSubquery expressions.
Note
extractCorrelatedScalarSubqueries is used exclusively when RewriteCorrelatedScalarSubquery is executed (i.e. applied to a logical plan).

ReplaceExpressions

admin阅读(1388)

ReplaceExpressions Logical Optimization

ReplaceExpressions is part of the Finish Analysis once-executed batch in the standard batches of the Catalyst Optimizer.

ReplaceExpressions is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply traverses all Catalyst expressions (in the input LogicalPlan) and replaces a RuntimeReplaceable expression into its single child.

PushPredicateThroughJoin

admin阅读(1355)

PushPredicateThroughJoin Logical Optimization

PushPredicateThroughJoin is a base logical optimization that FIXME.

PushPredicateThroughJoin is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Catalyst Optimizer.

PushPredicateThroughJoin is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

PullupCorrelatedPredicates

admin阅读(1679)

PullupCorrelatedPredicates Logical Optimization

PullupCorrelatedPredicates is a base logical optimization that transforms logical plans with the following operators:

  1. Filter operators with an Aggregate child operator

  2. UnaryNode operators

PullupCorrelatedPredicates is part of the Pullup Correlated Expressions once-executed batch in the standard batches of the Catalyst Optimizer.

PullupCorrelatedPredicates is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

PullupCorrelatedPredicates uses PredicateHelper for…​FIXME

pullOutCorrelatedPredicates Internal Method

pullOutCorrelatedPredicates…​FIXME

Note
pullOutCorrelatedPredicates is used exclusively when PullupCorrelatedPredicates is requested to rewriteSubQueries.

rewriteSubQueries Internal Method

rewriteSubQueries…​FIXME

Note
rewriteSubQueries is used exclusively when PullupCorrelatedPredicates is executed (i.e. applied to a logical plan).

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply transforms the input logical plan as follows:

  1. For Filter operators with an Aggregate child operator, apply rewriteSubQueries with the Filter and the Aggregate and its child as the outer plans

  2. For UnaryNode operators, apply rewriteSubQueries with the operator and its children as the outer plans

PropagateEmptyRelation

admin阅读(1607)

PropagateEmptyRelation Logical Optimization

PropagateEmptyRelation is part of the LocalRelation fixed-point batch in the standard batches of the Catalyst Optimizer.

PropagateEmptyRelation is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Explode

Join

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

OptimizeSubqueries

admin阅读(1619)

OptimizeSubqueries Logical Optimization

OptimizeSubqueries is a base logical optimization that FIXME.

OptimizeSubqueries is part of the Subquery once-executed batch in the standard batches of the Catalyst Optimizer.

OptimizeSubqueries is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

使用 json定义spark sql schema 代码例子

admin阅读(2526)

OptimizeIn

admin阅读(1483)

OptimizeIn Logical Optimization

  1. Replaces an In expression that has an empty list and the value expression not nullable to false

  2. Eliminates duplicates of Literal expressions in an In predicate expression that is inSetConvertible

  3. Replaces an In predicate expression that is inSetConvertible with InSet expressions when the number of literal expressions in the list expression is greater than spark.sql.optimizer.inSetConversionThreshold internal configuration property (default: 10)

OptimizeIn is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Catalyst Optimizer.

OptimizeIn is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

NullPropagation

admin阅读(1523)

NullPropagation Logical Optimization — Nullability (NULL Value) Propagation

NullPropagation is a base logical optimization that FIXME.

NullPropagation is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Catalyst Optimizer.

NullPropagation is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Example: Count Aggregate Operator with Nullable Expressions Only

NullPropagation optimization rewrites Count aggregate expressions that include expressions that are all nullable to Cast(Literal(0L)).

Example: Count Aggregate Operator with Non-Nullable Non-Distinct Expressions

NullPropagation optimization rewrites any non-nullable non-distinct Count aggregate expressions to Literal(1).

Note

Count aggregate expression represents count function internally.

Note

current_timestamp() function is non-nullable expression.

Example

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

LimitPushDown

admin阅读(1720)

LimitPushDown Logical Optimization

LimitPushDown is a base logical optimization that transforms the following logical plans:

  • LocalLimit with Union

  • LocalLimit with Join

LimitPushDown is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Catalyst Optimizer.

LimitPushDown is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

Creating LimitPushDown Instance

LimitPushDown takes the following when created:

LimitPushDown initializes the internal registries and counters.

Note
LimitPushDown is created when

关注公众号:spark技术分享

联系我们联系我们