关注 spark技术分享,
撸spark源码 玩spark最佳实践

Partitioning — Specification of Physical Operator’s Output Partitions

Partitioning — Specification of Physical Operator’s Output Partitions

Partitioning is the contract to hint the Spark Physical Optimizer for the number of partitions the output of a physical operator should be split across.

numPartitions is used in:

Table 1. Partitioning Schemes (Partitionings) and Their Properties
Partitioning compatibleWith guarantees numPartitions satisfies

BroadcastPartitioning

BroadcastPartitioning with the same BroadcastMode

Exactly the same BroadcastPartitioning

1

BroadcastDistribution with the same BroadcastMode

HashPartitioning

  • clustering expressions

  • numPartitions

HashPartitioning (when their underlying expressions are semantically equal, i.e. deterministic and canonically equal)

HashPartitioning (when their underlying expressions are semantically equal, i.e. deterministic and canonically equal)

Input numPartitions

PartitioningCollection

  • partitionings

Any Partitioning that is compatible with one of the input partitionings

Any Partitioning that is guaranteed by any of the input partitionings

Number of partitions of the first Partitioning in the input partitionings

Any Distribution that is satisfied by any of the input partitionings

RangePartitioning

  • ordering collection of SortOrder

  • numPartitions

RangePartitioning (when semantically equal, i.e. underlying expressions are deterministic and canonically equal)

RangePartitioning (when semantically equal, i.e. underlying expressions are deterministic and canonically equal)

Input numPartitions

RoundRobinPartitioning

  • numPartitions

Always negative

Always negative

Input numPartitions

UnspecifiedDistribution

SinglePartition

Any Partitioning with exactly one partition

Any Partitioning with exactly one partition

1

Any Distribution except BroadcastDistribution

UnknownPartitioning

  • numPartitions

Always negative

Always negative

Input numPartitions

UnspecifiedDistribution

赞(0) 打赏
未经允许不得转载:spark技术分享 » Partitioning — Specification of Physical Operator’s Output Partitions
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏