SortExec Unary Physical Operator
SortExec is a unary physical operator that is created when:
-
BasicOperators execution planning strategy is requested to plan a Sort logical operator
-
FileFormatWriter helper object is requested to write the result of a structured query
-
EnsureRequirements physical query optimization is executed (and enforces partition requirements for data distribution and ordering of a physical operator)
SortExec supports Java code generation (aka codegen).
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
val q = Seq((0, "zero"), (1, "one")).toDF("id", "name").sort('id) val qe = q.queryExecution val logicalPlan = qe.analyzed scala> println(logicalPlan.numberedTreeString) 00 Sort [id#72 ASC NULLS FIRST], true 01 +- Project [_1#69 AS id#72, _2#70 AS name#73] 02 +- LocalRelation [_1#69, _2#70] // BasicOperators does the conversion of Sort logical operator to SortExec val sparkPlan = qe.sparkPlan scala> println(sparkPlan.numberedTreeString) 00 Sort [id#72 ASC NULLS FIRST], true, 0 01 +- LocalTableScan [id#72, name#73] // SortExec supports Whole-Stage Code Generation val executedPlan = qe.executedPlan scala> println(executedPlan.numberedTreeString) 00 *(1) Sort [id#72 ASC NULLS FIRST], true, 0 01 +- Exchange rangepartitioning(id#72 ASC NULLS FIRST, 200) 02 +- LocalTableScan [id#72, name#73] import org.apache.spark.sql.execution.SortExec val sortExec = executedPlan.collect { case se: SortExec => se }.head assert(sortExec.isInstanceOf[SortExec]) |
When requested for the output attributes, SortExec simply gives whatever the child operator uses.
SortExec uses the sorting order expressions for the output data ordering requirements.
When requested for the output data partitioning requirements, SortExec simply gives whatever the child operator uses.
When requested for the required partition requirements, SortExec gives the OrderedDistribution (with the sorting order expressions for the ordering) when the global flag is enabled (true) or the UnspecifiedDistribution.
SortExec operator uses the spark.sql.sort.enableRadixSort internal configuration property (enabled by default) to control…FIXME
| Key | Name (in web UI) | Description |
|---|---|---|
|
|
peak memory |
|
|
|
sort time |
|
|
|
spill size |
Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce Method
|
1 2 3 4 5 |
doProduce(ctx: CodegenContext): String |
|
Note
|
doProduce is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.
|
doProduce…FIXME
Creating SortExec Instance
SortExec takes the following when created:
-
Sorting order expressions (
Seq[SortOrder]) -
Child physical plan
spark技术分享