Sort Unary Logical Operator
Sort
is a unary logical operator that represents the following in a logical plan:
-
ORDER BY
,SORT BY
,SORT BY … DISTRIBUTE BY
andCLUSTER BY
clauses (whenAstBuilder
is requested to parse a query) -
Dataset.sortWithinPartitions, Dataset.sort and Dataset.randomSplit operators
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
// Using the feature of ordinal literal val ids = Seq(1,3,2).toDF("id").sort(lit(1)) val logicalPlan = ids.queryExecution.logical scala> println(logicalPlan.numberedTreeString) 00 Sort [1 ASC NULLS FIRST], true 01 +- AnalysisBarrier 02 +- Project [value#22 AS id#24] 03 +- LocalRelation [value#22] import org.apache.spark.sql.catalyst.plans.logical.Sort val sortOp = logicalPlan.collect { case s: Sort => s }.head scala> println(sortOp.numberedTreeString) 00 Sort [1 ASC NULLS FIRST], true 01 +- AnalysisBarrier 02 +- Project [value#22 AS id#24] 03 +- LocalRelation [value#22] |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
val nums = Seq((0, "zero"), (1, "one")).toDF("id", "name") // Creates a Sort logical operator: // - descending sort direction for id column (specified explicitly) // - name column is wrapped with ascending sort direction val numsOrdered = nums.sort('id.desc, 'name) val logicalPlan = numsOrdered.queryExecution.logical scala> println(logicalPlan.numberedTreeString) 00 'Sort ['id DESC NULLS LAST, 'name ASC NULLS FIRST], true 01 +- Project [_1#11 AS id#14, _2#12 AS name#15] 02 +- LocalRelation [_1#11, _2#12] |
Sort
takes the following when created:
-
SortOrder ordering expressions
-
global
flag for global (true
) or partition-only (false
) sorting -
Child logical plan
The output schema of a Sort
operator is the output of the child logical operator.
Tip
|
Use orderBy or sortBy operators from the Catalyst DSL to create a Sort logical operator, e.g. for testing or Spark SQL internals exploration.
|
Note
|
Sorting is supported for columns of orderable type only (which is enforced at analysis when CheckAnalysis is requested to checkAnalysis).
|
Note
|
Sort logical operator is resolved to SortExec unary physical operator when BasicOperators execution planning strategy is executed.
|
Catalyst DSL — orderBy
and sortBy
Operators
1 2 3 4 5 6 |
orderBy(sortExprs: SortOrder*): LogicalPlan sortBy(sortExprs: SortOrder*): LogicalPlan |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import org.apache.spark.sql.catalyst.dsl.plans._ val t1 = table("t1") import org.apache.spark.sql.catalyst.dsl.expressions._ val globalSortById = t1.orderBy('id.asc_nullsLast) // Note true for the global flag scala> println(globalSortById.numberedTreeString) 00 'Sort ['id ASC NULLS LAST], true 01 +- 'UnresolvedRelation `t1` // Note false for the global flag val partitionOnlySortById = t1.sortBy('id.asc_nullsLast) scala> println(partitionOnlySortById.numberedTreeString) 00 'Sort ['id ASC NULLS LAST], false 01 +- 'UnresolvedRelation `t1` |