关注 spark技术分享,
撸spark源码 玩spark最佳实践

SparkPlanner — Spark Query Planner

SparkPlanner — Spark Query Planner

SparkPlanner is a concrete Catalyst Query Planner that converts a logical plan to one or more physical plans using execution planning strategies with support for extra strategies (by means of ExperimentalMethods) and extraPlanningStrategies.

Note
SparkPlanner is expected to plan (aka generate) at least one physical plan per logical plan.

SparkPlanner is available as planner of a SessionState.

Table 1. SparkPlanner’s Execution Planning Strategies (in execution order)
SparkStrategy Description

ExperimentalMethods‘s extraStrategies

extraPlanningStrategies

Extension point for extra planning strategies

DataSourceV2Strategy

FileSourceStrategy

DataSourceStrategy

SpecialLimits

Aggregation

JoinSelection

InMemoryScans

BasicOperators

Note
SparkPlanner extends SparkStrategies abstract class.

Creating SparkPlanner Instance

SparkPlanner takes the following when created:

Note

SparkPlanner is created in:

Extension Point for Extra Planning Strategies — extraPlanningStrategies Method

extraPlanningStrategies is an extension point to register extra planning strategies with the query planner.

Note
extraPlanningStrategies are executed after extraStrategies.
Note

extraPlanningStrategies is used when SparkPlanner is requested for planning strategies.

extraPlanningStrategies is overriden in the SessionState builders — BaseSessionStateBuilder and HiveSessionStateBuilder.

Collecting PlanLater Physical Operators — collectPlaceholders Method

collectPlaceholders collects all PlanLater physical operators in the plan physical plan.

Note
collectPlaceholders is part of QueryPlanner Contract.

Pruning “Bad” Physical Plans — prunePlans Method

prunePlans gives the input plans physical plans back (i.e. with no changes).

Note
prunePlans is part of QueryPlanner Contract to remove somehow “bad” plans.

Creating Physical Operator (Possibly Under FilterExec and ProjectExec Operators) — pruneFilterProject Method

Note
pruneFilterProject is almost like DataSourceStrategy.pruneFilterProjectRaw.

pruneFilterProject branches off per whether it is possible to use a column pruning only (to get the right projection) and the input projectList columns of this projection are enough to evaluate all input filterPredicates filter conditions.

If so, pruneFilterProject does the following:

  1. Applies the input scanBuilder function to the input projectList columns that creates a new physical operator

  2. If there are Catalyst predicate expressions in the input prunePushedDownFilters that cannot be pushed down, pruneFilterProject creates a FilterExec unary physical operator (with the unhandled predicate expressions)

  3. Otherwise, pruneFilterProject simply returns the physical operator

Note
In this case no extra ProjectExec unary physical operator is created.

If not (i.e. it is neither possible to use a column pruning only nor evaluate filter conditions), pruneFilterProject does the following:

  1. Applies the input scanBuilder function to the projection and filtering columns that creates a new physical operator

  2. Creates a FilterExec unary physical operator (with the unhandled predicate expressions if available)

  3. Creates a ProjectExec unary physical operator with the optional FilterExec operator (with the scan physical operator) or simply the scan physical operator alone

Note

pruneFilterProject is used when:

赞(0) 打赏
未经允许不得转载:spark技术分享 » SparkPlanner — Spark Query Planner
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏