关注 spark技术分享,
撸spark源码 玩spark最佳实践

ShuffleExternalSorter — Cache-Efficient Sorter

ShuffleExternalSorter — Cache-Efficient Sorter

ShuffleExternalSorter is a specialized cache-efficient sorter that sorts arrays of compressed record pointers and partition ids. By using only 8 bytes of space per record in the sorting array, ShuffleExternalSorter can fit more of the array into cache.

ShuffleExternalSorter is a MemoryConsumer.

Table 1. ShuffleExternalSorter’s Internal Properties
Name Initial Value Description

inMemSorter

(empty)

ShuffleInMemorySorter

Tip

Enable INFO or ERROR logging levels for org.apache.spark.shuffle.sort.ShuffleExternalSorter logger to see what happens in ShuffleExternalSorter.

Add the following line to conf/log4j.properties:

Refer to Logging.

getMemoryUsage Method

Caution
FIXME

closeAndGetSpills Method

Caution
FIXME

insertRecord Method

Caution
FIXME

freeMemory Method

Caution
FIXME

getPeakMemoryUsedBytes Method

Caution
FIXME

writeSortedFile Method

Caution
FIXME

cleanupResources Method

Caution
FIXME

Creating ShuffleExternalSorter Instance

ShuffleExternalSorter takes the following when created:

  1. memoryManager — TaskMemoryManager

  2. blockManager — BlockManager

  3. taskContext — TaskContext

  4. initialSize

  5. numPartitions

  6. SparkConf

  7. writeMetrics — ShuffleWriteMetrics

ShuffleExternalSorter initializes itself as a MemoryConsumer (with pageSize as the minimum of PackedRecordPointer.MAXIMUM_PAGE_SIZE_BYTES and pageSizeBytes, and Tungsten memory mode).

ShuffleExternalSorter uses spark.shuffle.file.buffer (for fileBufferSizeBytes) and spark.shuffle.spill.numElementsForceSpillThreshold (for numElementsForSpillThreshold) Spark properties.

ShuffleExternalSorter creates a ShuffleInMemorySorter (with spark.shuffle.sort.useRadixSort Spark property enabled by default).

ShuffleExternalSorter initializes the internal registries and counters.

Note
ShuffleExternalSorter is created when UnsafeShuffleWriter is open (which is when UnsafeShuffleWriter is created).

Freeing Execution Memory by Spilling To Disk — spill Method

Note
spill is part of MemoryConsumer contract to sort and spill the current records due to memory pressure.

spill frees execution memory, updates TaskMetrics, and in the end returns the spill size.

Note
spill returns 0 when ShuffleExternalSorter has no ShuffleInMemorySorter or the ShuffleInMemorySorter manages no records.

You should see the following INFO message in the logs:

spill writes sorted file (with isLastFile disabled).

spill frees memory and records the spill size.

spill resets the internal ShuffleInMemorySorter (that in turn frees up the underlying in-memory pointer array).

spill returns the spill size.

赞(0) 打赏
未经允许不得转载:spark技术分享 » ShuffleExternalSorter — Cache-Efficient Sorter
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏