ExternalRDDScanExec Leaf Physical Operator
ExternalRDDScanExec
is a leaf physical operator for…FIXME
ExternalRDDScanExec
is a leaf physical operator for…FIXME
ExecutedCommandExec
is a leaf physical operator for executing logical commands with side effects.
ExecutedCommandExec
runs a command and caches the result in sideEffectResult internal attribute.
Method | Description |
---|---|
Executes |
|
sideEffectResult
Internal Lazy Attribute
1 2 3 4 5 |
sideEffectResult: Seq[InternalRow] |
sideEffectResult
requests RunnableCommand
to run (that produces a Seq[Row]
) and converts the result to Catalyst types using a Catalyst converter function for the schema.
Note
|
sideEffectResult is used when ExecutedCommandExec is requested for executeCollect, executeToIterator, executeTake, doExecute.
|
DebugExec
is a unary physical operator that…FIXME
DataWritingCommandExec
is a physical operator that is the execution environment for a DataWritingCommand logical command at execution time.
DataWritingCommandExec
is created exclusively when BasicOperators execution planning strategy is requested to plan a DataWritingCommand logical command.
When requested for performance metrics, DataWritingCommandExec
simply requests the DataWritingCommand for them.
Name | Description |
---|---|
|
Collection of InternalRows ( Used when |
executeCollect
Method
1 2 3 4 5 |
executeCollect(): Array[InternalRow] |
Note
|
executeCollect is part of the SparkPlan Contract to execute the physical operator and collect results.
|
executeCollect
…FIXME
executeToIterator
Method
1 2 3 4 5 |
executeToIterator: Iterator[InternalRow] |
Note
|
executeToIterator is part of the SparkPlan Contract to…FIXME.
|
executeToIterator
…FIXME
executeTake
Method
1 2 3 4 5 |
executeTake(limit: Int): Array[InternalRow] |
Note
|
executeTake is part of the SparkPlan Contract to take the first n UnsafeRows .
|
executeTake
…FIXME
doExecute
Method
1 2 3 4 5 |
doExecute(): RDD[InternalRow] |
Note
|
doExecute is part of the SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow] ).
|
doExecute
simply requests the SQLContext for the SparkContext that is then requested to distribute (parallelize
) the sideEffectResult (over 1 partition).
DataSourceV2ScanExec
is a leaf physical operator to represent DataSourceV2Relation logical operators at execution time.
Note
|
A DataSourceV2Relation logical operator is created when…FIXME |
DataSourceV2ScanExec
is a ColumnarBatchScan that supports vectorized batch decoding (when created for a DataSourceReader that supports it, i.e. the DataSourceReader
is a SupportsScanColumnarBatch with the enableBatchRead flag enabled).
DataSourceV2ScanExec
is also a DataSourceReaderHolder
.
DataSourceV2ScanExec
is created exclusively when DataSourceV2Strategy
execution planning strategy is executed and finds a DataSourceV2Relation logical operator in a logical query plan.
DataSourceV2ScanExec
gives the single input RDD as the only input RDD of internal rows (when WholeStageCodegenExec
physical operator is executed).
Name | Description |
---|---|
Collection of DataReaderFactory objects of UnsafeRows Used when…FIXME |
doExecute
Method
1 2 3 4 5 |
doExecute(): RDD[InternalRow] |
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow] ).
|
doExecute
…FIXME
supportsBatch
Property
1 2 3 4 5 |
supportsBatch: Boolean |
Note
|
supportsBatch is part of ColumnarBatchScan Contract to control whether the physical operator supports vectorized decoding or not.
|
supportsBatch
is enabled (i.e. true
) only when the DataSourceReader is a SupportsScanColumnarBatch with the enableBatchRead flag enabled.
Note
|
enableBatchRead flag is enabled by default. |
supportsBatch
is disabled (i.e. false
) otherwise.
DataSourceV2ScanExec
takes the following when created:
DataSourceV2ScanExec
initializes the internal registries and counters.
inputRDD
Internal Property
1 2 3 4 5 |
inputRDD: RDD[InternalRow] |
Note
|
inputRDD is a Scala lazy value which is computed once when accessed and cached afterwards.
|
inputRDD
…FIXME
Note
|
inputRDD is used when DataSourceV2ScanExec physical operator is requested for the input RDDs and to execute.
|
CoalesceExec
is a unary physical operator (i.e. with one child physical operator) to…FIXME…with numPartitions
number of partitions and a child
spark plan.
CoalesceExec
represents Repartition logical operator at execution (when shuffle was disabled — see BasicOperators execution planning strategy). When executed, it executes the input child
and calls coalesce on the result RDD (with shuffle
disabled).
Please note that since physical operators present themselves without the suffix Exec, CoalesceExec
is the Coalesce
in the Physical Plan section in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
scala> df.rdd.getNumPartitions res6: Int = 8 scala> df.coalesce(1).rdd.getNumPartitions res7: Int = 1 scala> df.coalesce(1).explain(extended = true) == Parsed Logical Plan == Repartition 1, false +- LocalRelation [value#1] == Analyzed Logical Plan == value: int Repartition 1, false +- LocalRelation [value#1] == Optimized Logical Plan == Repartition 1, false +- LocalRelation [value#1] == Physical Plan == Coalesce 1 +- LocalTableScan [value#1] |
output
collection of Attribute matches the child
‘s (since CoalesceExec
is about changing the number of partitions not the internal representation).
outputPartitioning
returns a SinglePartition when the input numPartitions
is 1
while a UnknownPartitioning partitioning scheme for the other cases.
BroadcastNestedLoopJoinExec
is a binary physical operator (with two child left and right physical operators) that is created (and converted to) when JoinSelection physical plan strategy finds a Join logical operator that meets either case:
canBuildRight join type and right
physical operator broadcastable
canBuildLeft join type and left
broadcastable
non-InnerLike
join type
Note
|
BroadcastNestedLoopJoinExec is the default physical operator when no other operators have matched selection requirements.
|
Note
|
canBuildRight join types are:
canBuildLeft join types are:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
val nums = spark.range(2) val letters = ('a' to 'c').map(_.toString).toDF("letter") val q = nums.crossJoin(letters) scala> q.explain == Physical Plan == BroadcastNestedLoopJoin BuildRight, Cross :- *Range (0, 2, step=1, splits=Some(8)) +- BroadcastExchange IdentityBroadcastMode +- LocalTableScan [letter#69] |
Key | Name (in web UI) | Description |
---|---|---|
number of output rows |
BuildSide | Left Child | Right Child |
---|---|---|
|
BroadcastDistribution (uses |
|
|
BroadcastDistribution (uses |
BroadcastNestedLoopJoinExec
takes the following when created:
Left physical operator
Right physical operator
Optional join condition expressions