关注 spark技术分享,
撸spark源码 玩spark最佳实践

HashClusteredDistribution

HashClusteredDistribution

HashClusteredDistribution is a Distribution that creates a HashPartitioning for the hash expressions and a requested number of partitions.

HashClusteredDistribution specifies None for the required number of partitions.

Note
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property with the default of 200 partitions).

HashClusteredDistribution is created when the following physical operators are requested for a required child distribution:

HashClusteredDistribution takes hash expressions when created.

HashClusteredDistribution requires that the hash expressions should not be empty (i.e. Nil).

HashClusteredDistribution is used when:

  • EnsureRequirements is requested to add an ExchangeCoordinator for Adaptive Query Execution

  • HashPartitioning is requested to satisfies

createPartitioning Method

Note
createPartitioning is part of Distribution Contract to create a Partitioning for a given number of partitions.

createPartitioning creates a HashPartitioning for the hash expressions and the input numPartitions.

赞(0) 打赏
未经允许不得转载:spark技术分享 » HashClusteredDistribution
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏