关注 spark技术分享,
撸spark源码 玩spark最佳实践

CostBasedJoinReorder

CostBasedJoinReorder Logical Optimization — Join Reordering in Cost-Based Optimization

ReorderJoin is part of the Join Reorder once-executed batch in the standard batches of the Catalyst Optimizer.

ReorderJoin is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

CostBasedJoinReorder applies the join optimizations on a logical plan with 2 or more consecutive inner or cross joins (possibly separated by Project operators) when spark.sql.cbo.enabled and spark.sql.cbo.joinReorder.enabled configuration properties are both enabled.

CostBasedJoinReorder uses row count statistic that is computed using ANALYZE TABLE COMPUTE STATISTICS SQL command with no NOSCAN option.

Caution

FIXME Examples of other join queries

  • Cross join with join condition

  • Project with attributes only and Inner join with join condition

  • Project with attributes only and Cross join with join condition

Tip

Enable DEBUG logging level for org.apache.spark.sql.catalyst.optimizer.JoinReorderDP logger to see the join reordering duration.

Add the following line to conf/log4j.properties:

Refer to Logging.

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply traverses the input logical plan down and tries to reorder the following logical operators:

  • Join for CROSS or INNER joins with a join condition

  • Project with the above Join child operator and the project list of Attribute leaf expressions only

Reordering Logical Plan with Join Operators — reorder Internal Method

reorder…​FIXME

Note
reorder is used exclusively when CostBasedJoinReorder is applied to a logical plan.

replaceWithOrderedJoin Internal Method

replaceWithOrderedJoin…​FIXME

Note
replaceWithOrderedJoin is used recursively and when CostBasedJoinReorder is reordering…​FIXME

Extracting Consecutive Join Operators — extractInnerJoins Internal Method

extractInnerJoins finds consecutive Join logical operators (inner or cross) with join conditions or Project logical operators with Join logical operator and the project list of Attribute leaf expressions only.

For Project operators extractInnerJoins calls itself recursively with the Join operator inside.

In the end, extractInnerJoins gives the collection of logical plans under the consecutive Join logical operators (possibly separated by Project operators only) and their join conditions (for which And expressions have been split).

Note
extractInnerJoins is used recursively when CostBasedJoinReorder is reordering a logical plan.
赞(0) 打赏
未经允许不得转载:spark技术分享 » CostBasedJoinReorder
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏