Repartition Logical Operators — Repartition and RepartitionByExpression
Repartition and RepartitionByExpression (repartition operations in short) are unary logical operators that create a new RDD
that has exactly numPartitions partitions.
Note
|
RepartitionByExpression is also called distribute operator.
|
Repartition is the result of coalesce or repartition (with no partition expressions defined) operators.
RepartitionByExpression is the result of the following operators:
-
Dataset.repartition operator (with explicit partition expressions defined)
-
DISTRIBUTE BY SQL clause.
Repartition
and RepartitionByExpression
logical operators are described by:
Note
|
BasicOperators strategy resolves Repartition to ShuffleExchangeExec (with RoundRobinPartitioning partitioning scheme) or CoalesceExec physical operators per shuffle — enabled or not, respectively.
|
Note
|
BasicOperators strategy resolves RepartitionByExpression to ShuffleExchangeExec physical operator with HashPartitioning partitioning scheme.
|
Repartition Operation Optimizations
-
CollapseRepartition logical optimization collapses adjacent repartition operations.
-
Repartition operations allow FoldablePropagation and PushDownPredicate logical optimizations to “push through”.
-
PropagateEmptyRelation logical optimization may result in an empty LocalRelation for repartition operations.