关注 spark技术分享,
撸spark源码 玩spark最佳实践

ExternalSorter

ExternalSorter

ExternalSorter is a Spillable of WritablePartitionedPairCollection of K-key / C-value pairs.

When created ExternalSorter expects three different types of data defined, i.e. K, V, C, for keys, values, and combiner (partial) values, respectively.

Tip

Enable INFO or WARN logging levels for org.apache.spark.util.collection.ExternalSorter logger to see what happens in ExternalSorter.

Add the following line to conf/log4j.properties:

Refer to Logging.

stop Method

Caution
FIXME

writePartitionedFile Method

Caution
FIXME

Creating ExternalSorter Instance

ExternalSorter takes the following:

  1. TaskContext

  2. Optional Aggregator

  3. Optional Partitioner

  4. Optional Scala’s Ordering

  5. Optional Serializer

Note
ExternalSorter uses SparkEnv to access the default Serializer.

spillMemoryIteratorToDisk Internal Method

Caution
FIXME

spill Method

Note
spill is part of Spillable contract.
Caution
FIXME

maybeSpillCollection Internal Method

Caution
FIXME

insertAll Method

Caution
FIXME

Settings

Table 1. Spark Properties
Spark Property Default Value Description

spark.shuffle.file.buffer

32k

Size of the in-memory buffer for each shuffle file output stream. In bytes unless the unit is specified.

These buffers reduce the number of disk seeks and system calls made in creating intermediate shuffle files.

Used in ExternalSorter, BypassMergeSortShuffleWriter and ExternalAppendOnlyMap (for fileBufferSize) and in ShuffleExternalSorter (for fileBufferSizeBytes).

NOTE: spark.shuffle.file.buffer was previously known as spark.shuffle.file.buffer.kb.

spark.shuffle.spill.batchSize

10000

Size of object batches when reading/writing from serializers.

赞(0) 打赏
未经允许不得转载:spark技术分享 » ExternalSorter
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏