Partitioning — Specification of Physical Operator’s Output Partitions
Partitioning
is the contract to hint the Spark Physical Optimizer for the number of partitions the output of a physical operator should be split across.
1 2 3 4 5 |
numPartitions: Int |
numPartitions
is used in:
-
EnsureRequirements
physical preparation rule to enforce partition requirements of a physical operator -
SortMergeJoinExec for
outputPartitioning
forFullOuter
join type -
Partitioning.allCompatible
Partitioning | compatibleWith | guarantees | numPartitions | satisfies |
---|---|---|---|---|
|
|
Exactly the same |
1 |
BroadcastDistribution with the same |
|
|
|
Input |
|
|
Any |
Any |
Number of partitions of the first |
Any |
|
|
|
Input |
|
|
Always negative |
Always negative |
Input |
|
|
Any |
Any |
1 |
Any |
|
Always negative |
Always negative |
Input |