关注 spark技术分享,
撸spark源码 玩spark最佳实践

RowDataSourceScanExec

RowDataSourceScanExec Leaf Physical Operator

RowDataSourceScanExec is a DataSourceScanExec (and so indirectly a leaf physical operator) for scanning data from a BaseRelation.

RowDataSourceScanExec is created to represent a LogicalRelation with the following scan types when DataSourceStrategy execution planning strategy is executed:

  • CatalystScan, PrunedFilteredScan, PrunedScan (indirectly when DataSourceStrategy is requested to pruneFilterProjectRaw)

  • TableScan

RowDataSourceScanExec marks the filters that are included in the handledFilters with * (star) in the metadata that is used for a simple text representation.

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce Method

Note
doProduce is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.

doProduce…​FIXME

Creating RowDataSourceScanExec Instance

RowDataSourceScanExec takes the following when created:

Note
The input filter predicates and handled filters predicates are used exclusively for the metadata property that is part of DataSourceScanExec Contract to describe a scan for a simple text representation (in a query plan tree).

metadata Property

Note
metadata is part of DataSourceScanExec Contract to describe a scan for a simple text representation (in a query plan tree).

metadata marks the filter predicates that are included in the handled filters predicates with * (star).

Note
Filter predicates with * (star) are to denote filters that are pushed down to a relation (aka data source).

In the end, metadata creates the following mapping:

  1. ReadSchema with the output converted to catalog representation

  2. PushedFilters with the marked and unmarked filter predicates

赞(0) 打赏
未经允许不得转载:spark技术分享 » RowDataSourceScanExec
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏