PushDownOperatorsToDataSource Logical Optimization
PushDownOperatorsToDataSource is a logical optimization that pushes down operators to underlying data sources (i.e. DataSourceV2Relations) (before planning so that data source can report statistics more accurately).
Technically, PushDownOperatorsToDataSource is a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].
PushDownOperatorsToDataSource is part of the Push down operators to data source scan once-executed rule batch of the SparkOptimizer.
Executing Rule — apply Method
|
1 2 3 4 5 |
apply(plan: LogicalPlan): LogicalPlan |
|
Note
|
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).
|
apply…FIXME
pushDownRequiredColumns Internal Method
|
1 2 3 4 5 |
pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: AttributeSet): LogicalPlan |
pushDownRequiredColumns branches off per the input logical operator (that is supposed to have at least one child node):
-
For Project unary logical operator,
pushDownRequiredColumnstakes the references of the project expressions as the required columns (attributes) and executes itself recursively on the child logical operatorNote that the input
requiredByParentattributes are not considered in the required columns. -
For Filter unary logical operator,
pushDownRequiredColumnsadds the references of the filter condition to the inputrequiredByParentattributes and executes itself recursively on the child logical operator -
For DataSourceV2Relation unary logical operator,
pushDownRequiredColumns…FIXME -
For other logical operators,
pushDownRequiredColumnssimply executes itself (using TreeNode.mapChildren) recursively on the child nodes (logical operators)
|
Note
|
pushDownRequiredColumns is used exclusively when PushDownOperatorsToDataSource logical optimization is requested to execute.
|
Destructuring Logical Operator — FilterAndProject.unapply Method
|
1 2 3 4 5 |
unapply(plan: LogicalPlan): Option[(Seq[NamedExpression], Expression, DataSourceV2Relation)] |
unapply is part of FilterAndProject extractor object to destructure the input logical operator into a tuple with…FIXME
unapply works with (matches) the following logical operators:
-
For a Filter with a DataSourceV2Relation leaf logical operator,
unapply…FIXME -
For a Filter with a Project over a DataSourceV2Relation leaf logical operator,
unapply…FIXME -
For others,
unapplyreturnsNone(i.e. does nothing / does not match)
|
Note
|
unapply is used exclusively when PushDownOperatorsToDataSource logical optimization is requested to execute.
|
spark技术分享