RewritePredicateSubquery Logical Optimization
RewritePredicateSubquery
is a base logical optimization that transforms Filter operators with Exists and In (with ListQuery) expressions to Join operators as follows:
-
Filter
operators withExists
andIn
withListQuery
expressions give left-semi joins -
Filter
operators withNot
withExists
andIn
withListQuery
expressions give left-anti joins
Note
|
Prefer EXISTS (over Not with In with ListQuery subquery expression) if performance matters since they say “that will almost certainly be planned as a Broadcast Nested Loop join”.
|
RewritePredicateSubquery
is part of the RewriteSubquery once-executed batch in the standard batches of the Catalyst Optimizer.
RewritePredicateSubquery
is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan]
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
// FIXME Examples of RewritePredicateSubquery // 1. Filters with Exists and In (with ListQuery) expressions // 2. NOTs // Based on RewriteSubquerySuite // FIXME Contribute back to RewriteSubquerySuite import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.catalyst.rules.RuleExecutor object Optimize extends RuleExecutor[LogicalPlan] { import org.apache.spark.sql.catalyst.optimizer._ val batches = Seq( Batch("Column Pruning", FixedPoint(100), ColumnPruning), Batch("Rewrite Subquery", Once, RewritePredicateSubquery, ColumnPruning, CollapseProject, RemoveRedundantProject)) } val q = ... val optimized = Optimize.execute(q.analyze) |
RewritePredicateSubquery
is part of the RewriteSubquery once-executed batch in the standard batches of the Catalyst Optimizer.
rewriteExistentialExpr
Internal Method
1 2 3 4 5 6 7 |
rewriteExistentialExpr( exprs: Seq[Expression], plan: LogicalPlan): (Option[Expression], LogicalPlan) |
rewriteExistentialExpr
…FIXME
Note
|
rewriteExistentialExpr is used when…FIXME
|
dedupJoin
Internal Method
1 2 3 4 5 |
dedupJoin(joinPlan: LogicalPlan): LogicalPlan |
dedupJoin
…FIXME
Note
|
dedupJoin is used when…FIXME
|
getValueExpression
Internal Method
1 2 3 4 5 |
getValueExpression(e: Expression): Seq[Expression] |
getValueExpression
…FIXME
Note
|
getValueExpression is used when…FIXME
|
Executing Rule — apply
Method
1 2 3 4 5 |
apply(plan: LogicalPlan): LogicalPlan |
Note
|
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).
|
apply
transforms Filter unary operators in the input logical plan.
apply
splits conjunctive predicates in the condition expression (i.e. expressions separated by And
expression) and then partitions them into two collections of expressions with and without In or Exists subquery expressions.
apply
creates a Filter operator for condition (sub)expressions without subqueries (combined with And
expression) if available or takes the child operator (of the input Filter
unary operator).
In the end, apply
creates a new logical plan with Join operators for Exists and In expressions (and their negations) as follows:
-
For Exists predicate expressions,
apply
rewriteExistentialExpr and creates a Join operator with LeftSemi join type. In the end,apply
dedupJoin -
For
Not
expressions with a Exists predicate expression,apply
rewriteExistentialExpr and creates a Join operator with LeftAnti join type. In the end,apply
dedupJoin -
For In predicate expressions with a ListQuery subquery expression,
apply
getValueExpression followed by rewriteExistentialExpr and creates a Join operator with LeftSemi join type. In the end,apply
dedupJoin -
For
Not
expressions with a In predicate expression with a ListQuery subquery expression,apply
getValueExpression, rewriteExistentialExpr followed by splitting conjunctive predicates and creates a Join operator with LeftAnti join type. In the end,apply
dedupJoin -
For other predicate expressions,
apply
rewriteExistentialExpr and creates a Project unary operator with a Filter operator