DataSourceV2ScanExec Leaf Physical Operator
DataSourceV2ScanExec is a leaf physical operator to represent DataSourceV2Relation logical operators at execution time.
|
Note
|
A DataSourceV2Relation logical operator is created when…FIXME |
DataSourceV2ScanExec is a ColumnarBatchScan that supports vectorized batch decoding (when created for a DataSourceReader that supports it, i.e. the DataSourceReader is a SupportsScanColumnarBatch with the enableBatchRead flag enabled).
DataSourceV2ScanExec is also a DataSourceReaderHolder.
DataSourceV2ScanExec is created exclusively when DataSourceV2Strategy execution planning strategy is executed and finds a DataSourceV2Relation logical operator in a logical query plan.
DataSourceV2ScanExec gives the single input RDD as the only input RDD of internal rows (when WholeStageCodegenExec physical operator is executed).
| Name | Description |
|---|---|
|
Collection of DataReaderFactory objects of UnsafeRows Used when…FIXME |
Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method
|
1 2 3 4 5 |
doExecute(): RDD[InternalRow] |
|
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).
|
doExecute…FIXME
supportsBatch Property
|
1 2 3 4 5 |
supportsBatch: Boolean |
|
Note
|
supportsBatch is part of ColumnarBatchScan Contract to control whether the physical operator supports vectorized decoding or not.
|
supportsBatch is enabled (i.e. true) only when the DataSourceReader is a SupportsScanColumnarBatch with the enableBatchRead flag enabled.
|
Note
|
enableBatchRead flag is enabled by default. |
supportsBatch is disabled (i.e. false) otherwise.
Creating DataSourceV2ScanExec Instance
DataSourceV2ScanExec takes the following when created:
DataSourceV2ScanExec initializes the internal registries and counters.
Creating Input RDD of Internal Rows — inputRDD Internal Property
|
1 2 3 4 5 |
inputRDD: RDD[InternalRow] |
|
Note
|
inputRDD is a Scala lazy value which is computed once when accessed and cached afterwards.
|
inputRDD…FIXME
|
Note
|
inputRDD is used when DataSourceV2ScanExec physical operator is requested for the input RDDs and to execute.
|
spark技术分享