关注 spark技术分享,
撸spark源码 玩spark最佳实践

DataSourceScanExec Contract — Leaf Physical Operators to Scan Over BaseRelation

DataSourceScanExec Contract — Leaf Physical Operators to Scan Over BaseRelation

DataSourceScanExec is the contract of leaf physical operators that represent scans over BaseRelation.

Note
There are two DataSourceScanExecs, i.e. FileSourceScanExec and RowDataSourceScanExec, with a scan over data in HadoopFsRelation and generic BaseRelation relations, respectively.

DataSourceScanExec supports Java code generation (aka codegen)

Table 1. (Subset of) DataSourceScanExec Contract
Property Description

metadata

Metadata (as a collection of key-value pairs) that describes the scan when requested for the simple text representation.

relation

BaseRelation that is used in the node name and…​FIXME

tableIdentifier

Optional TableIdentifier

Note
The prefix for variable names for DataSourceScanExec operators in a generated Java source code is scan.

The default node name prefix is an empty string (that is used in the simple node description).

DataSourceScanExec uses the BaseRelation and the TableIdentifier as the node name in the following format:

Table 2. DataSourceScanExecs
DataSourceScanExec Description

FileSourceScanExec

RowDataSourceScanExec

Simple (Basic) Text Node Description (in Query Plan Tree) — simpleString Method

Note
simpleString is part of QueryPlan Contract to give the simple text description of a TreeNode in a query plan tree.

simpleString creates a text representation of every key-value entry in the metadata…​FIXME

Internally, simpleString sorts the metadata and concatenate the keys and the values (separated by the : ). While doing so, simpleString redacts sensitive information in every value and abbreviates it to the first 100 characters.

simpleString uses Spark Core’s Utils to truncatedString.

In the end, simpleString returns a text representation that is made up of the nodeNamePrefix, the nodeName, the output (schema attributes) and the metadata and is of the following format:

verboseString Method

Note
verboseString is part of QueryPlan Contract to…​FIXME.

verboseString simply returns the redacted sensitive information in verboseString (of the parent QueryPlan).

Text Representation of All Nodes in Tree — treeString Method

Note
treeString is part of TreeNode Contract to…​FIXME.

treeString simply returns the redacted sensitive information in the text representation of all nodes (in query plan tree) (of the parent TreeNode).

Redacting Sensitive Information — redact Internal Method

redact…​FIXME

Note
redact is used when DataSourceScanExec is requested for the simple, verbose and tree text representations.
赞(0) 打赏
未经允许不得转载:spark技术分享 » DataSourceScanExec Contract — Leaf Physical Operators to Scan Over BaseRelation
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏