关注 spark技术分享,
撸spark源码 玩spark最佳实践

Distribution — Contract For Data Distribution Across Partitions

Distribution — Contract For Data Distribution Across Partitions

Distribution is the contract of…​FIXME

Note
Distribution is a Scala sealed contract which means that all possible distributions are all in the same compilation unit (file).
Table 1. Distribution Contract
Method Description

requiredNumPartitions

Gives the required number of partitions for a distribution.

Used exclusively when EnsureRequirements physical optimization is requested to enforce partition requirements of a physical operator (and a child operator’s output partitioning does not satisfy a required child distribution that leads to inserting a ShuffleExchangeExec operator to a physical plan).

Note
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property with the default of 200 partitions).

createPartitioning

Creates a Partitioning for a given number of partitions.

Used exclusively when EnsureRequirements physical optimization is requested to enforce partition requirements of a physical operator (and creates a ShuffleExchangeExec physical operator with a required Partitioning).

Table 2. Distributions
Distribution Description

AllTuples

BroadcastDistribution

ClusteredDistribution

HashClusteredDistribution

OrderedDistribution

UnspecifiedDistribution

赞(0) 打赏
未经允许不得转载:spark技术分享 » Distribution — Contract For Data Distribution Across Partitions
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏