ShuffleExternalSorter — Cache-Efficient Sorter
ShuffleExternalSorter is a specialized cache-efficient sorter that sorts arrays of compressed record pointers and partition ids. By using only 8 bytes of space per record in the sorting array, ShuffleExternalSorter can fit more of the array into cache.
ShuffleExternalSorter is a MemoryConsumer.
| Name | Initial Value | Description |
|---|---|---|
|
(empty) |
|
|
Tip
|
Enable Add the following line to
Refer to Logging. |
getMemoryUsage Method
|
Caution
|
FIXME |
closeAndGetSpills Method
|
Caution
|
FIXME |
insertRecord Method
|
Caution
|
FIXME |
freeMemory Method
|
Caution
|
FIXME |
getPeakMemoryUsedBytes Method
|
Caution
|
FIXME |
writeSortedFile Method
|
Caution
|
FIXME |
cleanupResources Method
|
Caution
|
FIXME |
Creating ShuffleExternalSorter Instance
ShuffleExternalSorter takes the following when created:
-
memoryManager— TaskMemoryManager -
blockManager— BlockManager -
taskContext— TaskContext -
initialSize -
numPartitions -
writeMetrics— ShuffleWriteMetrics
ShuffleExternalSorter initializes itself as a MemoryConsumer (with pageSize as the minimum of PackedRecordPointer.MAXIMUM_PAGE_SIZE_BYTES and pageSizeBytes, and Tungsten memory mode).
ShuffleExternalSorter uses spark.shuffle.file.buffer (for fileBufferSizeBytes) and spark.shuffle.spill.numElementsForceSpillThreshold (for numElementsForSpillThreshold) Spark properties.
ShuffleExternalSorter creates a ShuffleInMemorySorter (with spark.shuffle.sort.useRadixSort Spark property enabled by default).
ShuffleExternalSorter initializes the internal registries and counters.
|
Note
|
ShuffleExternalSorter is created when UnsafeShuffleWriter is open (which is when UnsafeShuffleWriter is created).
|
Freeing Execution Memory by Spilling To Disk — spill Method
|
1 2 3 4 5 6 |
long spill(long size, MemoryConsumer trigger) throws IOException |
|
Note
|
spill is part of MemoryConsumer contract to sort and spill the current records due to memory pressure.
|
spill frees execution memory, updates TaskMetrics, and in the end returns the spill size.
|
Note
|
spill returns 0 when ShuffleExternalSorter has no ShuffleInMemorySorter or the ShuffleInMemorySorter manages no records.
|
You should see the following INFO message in the logs:
|
1 2 3 4 5 |
INFO Thread [id] spilling sort data of [memoryUsage] to disk ([size] times so far) |
spill writes sorted file (with isLastFile disabled).
spill frees memory and records the spill size.
spill resets the internal ShuffleInMemorySorter (that in turn frees up the underlying in-memory pointer array).
spill returns the spill size.
spark技术分享