ExternalRDDScanExec Leaf Physical Operator
ExternalRDDScanExec is a leaf physical operator for…FIXME
ExternalRDDScanExec is a leaf physical operator for…FIXME
ExecutedCommandExec is a leaf physical operator for executing logical commands with side effects.
ExecutedCommandExec runs a command and caches the result in sideEffectResult internal attribute.
| Method | Description |
|---|---|
|
Executes |
|
sideEffectResult Internal Lazy Attribute|
1 2 3 4 5 |
sideEffectResult: Seq[InternalRow] |
sideEffectResult requests RunnableCommand to run (that produces a Seq[Row]) and converts the result to Catalyst types using a Catalyst converter function for the schema.
|
Note
|
sideEffectResult is used when ExecutedCommandExec is requested for executeCollect, executeToIterator, executeTake, doExecute.
|
DebugExec is a unary physical operator that…FIXME
DataWritingCommandExec is a physical operator that is the execution environment for a DataWritingCommand logical command at execution time.
DataWritingCommandExec is created exclusively when BasicOperators execution planning strategy is requested to plan a DataWritingCommand logical command.
When requested for performance metrics, DataWritingCommandExec simply requests the DataWritingCommand for them.
| Name | Description |
|---|---|
|
|
Collection of InternalRows ( Used when |
executeCollect Method|
1 2 3 4 5 |
executeCollect(): Array[InternalRow] |
|
Note
|
executeCollect is part of the SparkPlan Contract to execute the physical operator and collect results.
|
executeCollect…FIXME
executeToIterator Method|
1 2 3 4 5 |
executeToIterator: Iterator[InternalRow] |
|
Note
|
executeToIterator is part of the SparkPlan Contract to…FIXME.
|
executeToIterator…FIXME
executeTake Method|
1 2 3 4 5 |
executeTake(limit: Int): Array[InternalRow] |
|
Note
|
executeTake is part of the SparkPlan Contract to take the first n UnsafeRows.
|
executeTake…FIXME
doExecute Method|
1 2 3 4 5 |
doExecute(): RDD[InternalRow] |
|
Note
|
doExecute is part of the SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).
|
doExecute simply requests the SQLContext for the SparkContext that is then requested to distribute (parallelize) the sideEffectResult (over 1 partition).
DataSourceV2ScanExec is a leaf physical operator to represent DataSourceV2Relation logical operators at execution time.
|
Note
|
A DataSourceV2Relation logical operator is created when…FIXME |
DataSourceV2ScanExec is a ColumnarBatchScan that supports vectorized batch decoding (when created for a DataSourceReader that supports it, i.e. the DataSourceReader is a SupportsScanColumnarBatch with the enableBatchRead flag enabled).
DataSourceV2ScanExec is also a DataSourceReaderHolder.
DataSourceV2ScanExec is created exclusively when DataSourceV2Strategy execution planning strategy is executed and finds a DataSourceV2Relation logical operator in a logical query plan.
DataSourceV2ScanExec gives the single input RDD as the only input RDD of internal rows (when WholeStageCodegenExec physical operator is executed).
| Name | Description |
|---|---|
|
Collection of DataReaderFactory objects of UnsafeRows Used when…FIXME |
doExecute Method|
1 2 3 4 5 |
doExecute(): RDD[InternalRow] |
|
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).
|
doExecute…FIXME
supportsBatch Property|
1 2 3 4 5 |
supportsBatch: Boolean |
|
Note
|
supportsBatch is part of ColumnarBatchScan Contract to control whether the physical operator supports vectorized decoding or not.
|
supportsBatch is enabled (i.e. true) only when the DataSourceReader is a SupportsScanColumnarBatch with the enableBatchRead flag enabled.
|
Note
|
enableBatchRead flag is enabled by default. |
supportsBatch is disabled (i.e. false) otherwise.
DataSourceV2ScanExec takes the following when created:
DataSourceV2ScanExec initializes the internal registries and counters.
inputRDD Internal Property|
1 2 3 4 5 |
inputRDD: RDD[InternalRow] |
|
Note
|
inputRDD is a Scala lazy value which is computed once when accessed and cached afterwards.
|
inputRDD…FIXME
|
Note
|
inputRDD is used when DataSourceV2ScanExec physical operator is requested for the input RDDs and to execute.
|
CoalesceExec is a unary physical operator (i.e. with one child physical operator) to…FIXME…with numPartitions number of partitions and a child spark plan.
CoalesceExec represents Repartition logical operator at execution (when shuffle was disabled — see BasicOperators execution planning strategy). When executed, it executes the input child and calls coalesce on the result RDD (with shuffle disabled).
Please note that since physical operators present themselves without the suffix Exec, CoalesceExec is the Coalesce in the Physical Plan section in the following example:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
scala> df.rdd.getNumPartitions res6: Int = 8 scala> df.coalesce(1).rdd.getNumPartitions res7: Int = 1 scala> df.coalesce(1).explain(extended = true) == Parsed Logical Plan == Repartition 1, false +- LocalRelation [value#1] == Analyzed Logical Plan == value: int Repartition 1, false +- LocalRelation [value#1] == Optimized Logical Plan == Repartition 1, false +- LocalRelation [value#1] == Physical Plan == Coalesce 1 +- LocalTableScan [value#1] |
output collection of Attribute matches the child‘s (since CoalesceExec is about changing the number of partitions not the internal representation).
outputPartitioning returns a SinglePartition when the input numPartitions is 1 while a UnknownPartitioning partitioning scheme for the other cases.
BroadcastNestedLoopJoinExec is a binary physical operator (with two child left and right physical operators) that is created (and converted to) when JoinSelection physical plan strategy finds a Join logical operator that meets either case:
canBuildRight join type and right physical operator broadcastable
canBuildLeft join type and left broadcastable
non-InnerLike join type
|
Note
|
BroadcastNestedLoopJoinExec is the default physical operator when no other operators have matched selection requirements.
|
|
Note
|
canBuildRight join types are:
canBuildLeft join types are:
|
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
val nums = spark.range(2) val letters = ('a' to 'c').map(_.toString).toDF("letter") val q = nums.crossJoin(letters) scala> q.explain == Physical Plan == BroadcastNestedLoopJoin BuildRight, Cross :- *Range (0, 2, step=1, splits=Some(8)) +- BroadcastExchange IdentityBroadcastMode +- LocalTableScan [letter#69] |
| Key | Name (in web UI) | Description |
|---|---|---|
|
number of output rows |
| BuildSide | Left Child | Right Child |
|---|---|---|
|
|
BroadcastDistribution (uses |
|
|
|
BroadcastDistribution (uses |
BroadcastNestedLoopJoinExec takes the following when created:
Left physical operator
Right physical operator
Optional join condition expressions