HashClusteredDistribution
HashClusteredDistribution is a Distribution that creates a HashPartitioning for the hash expressions and a requested number of partitions.
HashClusteredDistribution specifies None for the required number of partitions.
|
Note
|
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property with the default of 200 partitions).
|
HashClusteredDistribution is created when the following physical operators are requested for a required child distribution:
-
CoGroupExec, ShuffledHashJoinExec, SortMergeJoinExec
HashClusteredDistribution takes hash expressions when created.
HashClusteredDistribution requires that the hash expressions should not be empty (i.e. Nil).
HashClusteredDistribution is used when:
-
EnsureRequirementsis requested to add an ExchangeCoordinator for Adaptive Query Execution -
HashPartitioningis requested tosatisfies
createPartitioning Method
|
1 2 3 4 5 |
createPartitioning(numPartitions: Int): Partitioning |
|
Note
|
createPartitioning is part of Distribution Contract to create a Partitioning for a given number of partitions.
|
createPartitioning creates a HashPartitioning for the hash expressions and the input numPartitions.
spark技术分享