关注 spark技术分享,
撸spark源码 玩spark最佳实践

ShuffleManager — Pluggable Shuffle Systems

ShuffleManager — Pluggable Shuffle Systems

ShuffleManager is the pluggable mechanism for shuffle systems that track shuffle dependencies for ShuffleMapStage on the driver and executors.

Note
SortShuffleManager (short name: sort or tungsten-sort) is the one and only ShuffleManager in Spark 2.0.

spark.shuffle.manager Spark property sets up the default shuffle manager.

The driver and executor access their ShuffleManager instances using SparkEnv.

The driver registers shuffles with a shuffle manager, and executors (or tasks running locally in the driver) can ask to read and write data.

It is network-addressable, i.e. it is available on a host and port.

There can be many shuffle services running simultaneously and a driver registers with all of them when CoarseGrainedSchedulerBackend is used.

ShuffleManager Contract

Note
ShuffleManager is a private[spark] contract.
Table 1. ShuffleManager Contract
Method Description

registerShuffle

Executed when ShuffleDependency is created and registers itself.

getWriter

Used when a ShuffleMapTask runs (and requests a ShuffleWriter to write records for a partition).

getReader

Returns a ShuffleReader for a range of reduce partitions (to read key-value records for a ShuffleDependency dependency).

Used when:

  • CoGroupedRDD, ShuffledRDD, and SubtractedRDD are requested to compute a partition (for a ShuffleDependency dependency)

  • Spark SQL’s ShuffledRowRDD is requested to compute a partition

unregisterShuffle

Executed when ??? removes the metadata of a shuffle.

shuffleBlockResolver

Used when:

1. BlockManager requests a ShuffleBlockResolver capable of retrieving shuffle block data (for a ShuffleBlockId)

2. BlockManager requests a ShuffleBlockResolver for local shuffle block data as bytes.

stop

Used when SparkEnv stops.

Settings

Table 2. Spark Properties
Spark Property Default Value Description

spark.shuffle.manager

sort

ShuffleManager for a Spark application.

You can use a short name or the fully-qualified class name of a custom implementation.

The predefined aliases are sort and tungsten-sort with org.apache.spark.shuffle.sort.SortShuffleManager being the one and only ShuffleManager.

Further Reading or Watching

  1. (slides) Spark shuffle introduction by Raymond Liu (aka colorant).

赞(0) 打赏
未经允许不得转载:spark技术分享 » ShuffleManager — Pluggable Shuffle Systems
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏