SparkPlanner — Spark Query Planner
SparkPlanner
is a concrete Catalyst Query Planner that converts a logical plan to one or more physical plans using execution planning strategies with support for extra strategies (by means of ExperimentalMethods) and extraPlanningStrategies.
Note
|
SparkPlanner is expected to plan (aka generate) at least one physical plan per logical plan.
|
SparkPlanner
is available as planner of a SessionState
.
1 2 3 4 5 6 7 |
val spark: SparkSession = ... scala> :type spark.sessionState.planner org.apache.spark.sql.execution.SparkPlanner |
SparkStrategy | Description |
---|---|
|
|
Extension point for extra planning strategies |
|
Note
|
SparkPlanner extends SparkStrategies abstract class.
|
Creating SparkPlanner Instance
SparkPlanner
takes the following when created:
Note
|
|
Extension Point for Extra Planning Strategies — extraPlanningStrategies
Method
1 2 3 4 5 |
extraPlanningStrategies: Seq[Strategy] = Nil |
extraPlanningStrategies
is an extension point to register extra planning strategies with the query planner.
Note
|
extraPlanningStrategies are executed after extraStrategies.
|
Note
|
|
Collecting PlanLater Physical Operators — collectPlaceholders
Method
1 2 3 4 5 |
collectPlaceholders(plan: SparkPlan): Seq[(SparkPlan, LogicalPlan)] |
collectPlaceholders
collects all PlanLater physical operators in the plan
physical plan.
Note
|
collectPlaceholders is part of QueryPlanner Contract.
|
Pruning “Bad” Physical Plans — prunePlans
Method
1 2 3 4 5 |
prunePlans(plans: Iterator[SparkPlan]): Iterator[SparkPlan] |
prunePlans
gives the input plans
physical plans back (i.e. with no changes).
Note
|
prunePlans is part of QueryPlanner Contract to remove somehow “bad” plans.
|
Creating Physical Operator (Possibly Under FilterExec and ProjectExec Operators) — pruneFilterProject
Method
1 2 3 4 5 6 7 8 9 |
pruneFilterProject( projectList: Seq[NamedExpression], filterPredicates: Seq[Expression], prunePushedDownFilters: Seq[Expression] => Seq[Expression], scanBuilder: Seq[Attribute] => SparkPlan): SparkPlan |
Note
|
pruneFilterProject is almost like DataSourceStrategy.pruneFilterProjectRaw.
|
pruneFilterProject
branches off per whether it is possible to use a column pruning only (to get the right projection) and the input projectList
columns of this projection are enough to evaluate all input filterPredicates
filter conditions.
If so, pruneFilterProject
does the following:
-
Applies the input
scanBuilder
function to the inputprojectList
columns that creates a new physical operator -
If there are Catalyst predicate expressions in the input
prunePushedDownFilters
that cannot be pushed down,pruneFilterProject
creates a FilterExec unary physical operator (with the unhandled predicate expressions) -
Otherwise,
pruneFilterProject
simply returns the physical operator
Note
|
In this case no extra ProjectExec unary physical operator is created. |
If not (i.e. it is neither possible to use a column pruning only nor evaluate filter conditions), pruneFilterProject
does the following:
-
Applies the input
scanBuilder
function to the projection and filtering columns that creates a new physical operator -
Creates a FilterExec unary physical operator (with the unhandled predicate expressions if available)
-
Creates a ProjectExec unary physical operator with the optional
FilterExec
operator (with the scan physical operator) or simply the scan physical operator alone
Note
|
|