HashClusteredDistribution
HashClusteredDistribution
is a Distribution that creates a HashPartitioning for the hash expressions and a requested number of partitions.
HashClusteredDistribution
specifies None
for the required number of partitions.
Note
|
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property with the default of 200 partitions).
|
HashClusteredDistribution
is created when the following physical operators are requested for a required child distribution:
-
CoGroupExec
, ShuffledHashJoinExec, SortMergeJoinExec
HashClusteredDistribution
takes hash expressions when created.
HashClusteredDistribution
requires that the hash expressions should not be empty (i.e. Nil
).
HashClusteredDistribution
is used when:
-
EnsureRequirements
is requested to add an ExchangeCoordinator for Adaptive Query Execution -
HashPartitioning
is requested tosatisfies
createPartitioning
Method
1 2 3 4 5 |
createPartitioning(numPartitions: Int): Partitioning |
Note
|
createPartitioning is part of Distribution Contract to create a Partitioning for a given number of partitions.
|
createPartitioning
creates a HashPartitioning
for the hash expressions and the input numPartitions
.