SortExec Unary Physical Operator
SortExec
is a unary physical operator that is created when:
-
BasicOperators execution planning strategy is requested to plan a Sort logical operator
-
FileFormatWriter helper object is requested to write the result of a structured query
-
EnsureRequirements physical query optimization is executed (and enforces partition requirements for data distribution and ordering of a physical operator)
SortExec
supports Java code generation (aka codegen).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
val q = Seq((0, "zero"), (1, "one")).toDF("id", "name").sort('id) val qe = q.queryExecution val logicalPlan = qe.analyzed scala> println(logicalPlan.numberedTreeString) 00 Sort [id#72 ASC NULLS FIRST], true 01 +- Project [_1#69 AS id#72, _2#70 AS name#73] 02 +- LocalRelation [_1#69, _2#70] // BasicOperators does the conversion of Sort logical operator to SortExec val sparkPlan = qe.sparkPlan scala> println(sparkPlan.numberedTreeString) 00 Sort [id#72 ASC NULLS FIRST], true, 0 01 +- LocalTableScan [id#72, name#73] // SortExec supports Whole-Stage Code Generation val executedPlan = qe.executedPlan scala> println(executedPlan.numberedTreeString) 00 *(1) Sort [id#72 ASC NULLS FIRST], true, 0 01 +- Exchange rangepartitioning(id#72 ASC NULLS FIRST, 200) 02 +- LocalTableScan [id#72, name#73] import org.apache.spark.sql.execution.SortExec val sortExec = executedPlan.collect { case se: SortExec => se }.head assert(sortExec.isInstanceOf[SortExec]) |
When requested for the output attributes, SortExec
simply gives whatever the child operator uses.
SortExec
uses the sorting order expressions for the output data ordering requirements.
When requested for the output data partitioning requirements, SortExec
simply gives whatever the child operator uses.
When requested for the required partition requirements, SortExec
gives the OrderedDistribution (with the sorting order expressions for the ordering) when the global flag is enabled (true
) or the UnspecifiedDistribution.
SortExec
operator uses the spark.sql.sort.enableRadixSort internal configuration property (enabled by default) to control…FIXME
Key | Name (in web UI) | Description |
---|---|---|
|
peak memory |
|
|
sort time |
|
|
spill size |
Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce
Method
1 2 3 4 5 |
doProduce(ctx: CodegenContext): String |
Note
|
doProduce is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.
|
doProduce
…FIXME
Creating SortExec Instance
SortExec
takes the following when created:
-
Sorting order expressions (
Seq[SortOrder]
) -
Child physical plan