关注 spark技术分享,
撸spark源码 玩spark最佳实践

PushDownOperatorsToDataSource

PushDownOperatorsToDataSource Logical Optimization

PushDownOperatorsToDataSource is a logical optimization that pushes down operators to underlying data sources (i.e. DataSourceV2Relations) (before planning so that data source can report statistics more accurately).

Technically, PushDownOperatorsToDataSource is a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

PushDownOperatorsToDataSource is part of the Push down operators to data source scan once-executed rule batch of the SparkOptimizer.

Executing Rule — apply Method

Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…​FIXME

pushDownRequiredColumns Internal Method

pushDownRequiredColumns branches off per the input logical operator (that is supposed to have at least one child node):

  1. For Project unary logical operator, pushDownRequiredColumns takes the references of the project expressions as the required columns (attributes) and executes itself recursively on the child logical operator

    Note that the input requiredByParent attributes are not considered in the required columns.

  2. For Filter unary logical operator, pushDownRequiredColumns adds the references of the filter condition to the input requiredByParent attributes and executes itself recursively on the child logical operator

  3. For DataSourceV2Relation unary logical operator, pushDownRequiredColumns…​FIXME

  4. For other logical operators, pushDownRequiredColumns simply executes itself (using TreeNode.mapChildren) recursively on the child nodes (logical operators)

Note
pushDownRequiredColumns is used exclusively when PushDownOperatorsToDataSource logical optimization is requested to execute.

Destructuring Logical Operator — FilterAndProject.unapply Method

unapply is part of FilterAndProject extractor object to destructure the input logical operator into a tuple with…​FIXME

unapply works with (matches) the following logical operators:

  1. For a Filter with a DataSourceV2Relation leaf logical operator, unapply…​FIXME

  2. For a Filter with a Project over a DataSourceV2Relation leaf logical operator, unapply…​FIXME

  3. For others, unapply returns None (i.e. does nothing / does not match)

Note
unapply is used exclusively when PushDownOperatorsToDataSource logical optimization is requested to execute.
赞(0) 打赏
未经允许不得转载:spark技术分享 » PushDownOperatorsToDataSource
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏