DataSourceV2ScanExec Leaf Physical Operator
DataSourceV2ScanExec
is a leaf physical operator to represent DataSourceV2Relation logical operators at execution time.
Note
|
A DataSourceV2Relation logical operator is created when…FIXME |
DataSourceV2ScanExec
is a ColumnarBatchScan that supports vectorized batch decoding (when created for a DataSourceReader that supports it, i.e. the DataSourceReader
is a SupportsScanColumnarBatch with the enableBatchRead flag enabled).
DataSourceV2ScanExec
is also a DataSourceReaderHolder
.
DataSourceV2ScanExec
is created exclusively when DataSourceV2Strategy
execution planning strategy is executed and finds a DataSourceV2Relation logical operator in a logical query plan.
DataSourceV2ScanExec
gives the single input RDD as the only input RDD of internal rows (when WholeStageCodegenExec
physical operator is executed).
Name | Description |
---|---|
Collection of DataReaderFactory objects of UnsafeRows Used when…FIXME |
Executing Physical Operator (Generating RDD[InternalRow]) — doExecute
Method
1 2 3 4 5 |
doExecute(): RDD[InternalRow] |
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow] ).
|
doExecute
…FIXME
supportsBatch
Property
1 2 3 4 5 |
supportsBatch: Boolean |
Note
|
supportsBatch is part of ColumnarBatchScan Contract to control whether the physical operator supports vectorized decoding or not.
|
supportsBatch
is enabled (i.e. true
) only when the DataSourceReader is a SupportsScanColumnarBatch with the enableBatchRead flag enabled.
Note
|
enableBatchRead flag is enabled by default. |
supportsBatch
is disabled (i.e. false
) otherwise.
Creating DataSourceV2ScanExec Instance
DataSourceV2ScanExec
takes the following when created:
DataSourceV2ScanExec
initializes the internal registries and counters.
Creating Input RDD of Internal Rows — inputRDD
Internal Property
1 2 3 4 5 |
inputRDD: RDD[InternalRow] |
Note
|
inputRDD is a Scala lazy value which is computed once when accessed and cached afterwards.
|
inputRDD
…FIXME
Note
|
inputRDD is used when DataSourceV2ScanExec physical operator is requested for the input RDDs and to execute.
|