OptimizeIn Logical Optimization
OptimizeIn
is a base logical optimization that transforms logical plans with In predicate expressions as follows:
-
Replaces an
In
expression that has an empty list and the value expression not nullable tofalse
-
Eliminates duplicates of Literal expressions in an In predicate expression that is inSetConvertible
-
Replaces an
In
predicate expression that is inSetConvertible with InSet expressions when the number of literal expressions in the list expression is greater than spark.sql.optimizer.inSetConversionThreshold internal configuration property (default:10
)
OptimizeIn
is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Catalyst Optimizer.
OptimizeIn
is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan]
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
// Use Catalyst DSL to define a logical plan // HACK: Disable symbolToColumn implicit conversion // It is imported automatically in spark-shell (and makes demos impossible) // implicit def symbolToColumn(s: Symbol): org.apache.spark.sql.ColumnName trait ThatWasABadIdea implicit def symbolToColumn(ack: ThatWasABadIdea) = ack import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.catalyst.dsl.plans._ import org.apache.spark.sql.catalyst.plans.logical.LocalRelation val rel = LocalRelation('a.int, 'b.int, 'c.int) import org.apache.spark.sql.catalyst.expressions.{In, Literal} val plan = rel .where(In('a, Seq[Literal](1, 2, 3))) .analyze scala> println(plan.numberedTreeString) 00 Filter a#6 IN (1,2,3) 01 +- LocalRelation <empty>, [a#6, b#7, c#8] // In --> InSet spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", 0) import org.apache.spark.sql.catalyst.optimizer.OptimizeIn val optimizedPlan = OptimizeIn(plan) scala> println(optimizedPlan.numberedTreeString) 00 Filter a#6 INSET (1,2,3) 01 +- LocalRelation <empty>, [a#6, b#7, c#8] |
Executing Rule — apply
Method
1 2 3 4 5 |
apply(plan: LogicalPlan): LogicalPlan |
Note
|
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).
|
apply
…FIXME