Partitioning — Specification of Physical Operator’s Output Partitions
Partitioning is the contract to hint the Spark Physical Optimizer for the number of partitions the output of a physical operator should be split across.
|
1 2 3 4 5 |
numPartitions: Int |
numPartitions is used in:
-
EnsureRequirementsphysical preparation rule to enforce partition requirements of a physical operator -
SortMergeJoinExec for
outputPartitioningforFullOuterjoin type -
Partitioning.allCompatible
| Partitioning | compatibleWith | guarantees | numPartitions | satisfies |
|---|---|---|---|---|
|
|
|
Exactly the same |
1 |
BroadcastDistribution with the same |
|
|
|
|
Input |
|
|
|
Any |
Any |
Number of partitions of the first |
Any |
|
|
|
|
Input |
|
|
|
Always negative |
Always negative |
Input |
|
|
|
Any |
Any |
1 |
Any |
|
|
Always negative |
Always negative |
Input |
spark技术分享