ShuffleExternalSorter — Cache-Efficient Sorter
ShuffleExternalSorter
is a specialized cache-efficient sorter that sorts arrays of compressed record pointers and partition ids. By using only 8 bytes of space per record in the sorting array, ShuffleExternalSorter
can fit more of the array into cache.
ShuffleExternalSorter
is a MemoryConsumer.
Name | Initial Value | Description |
---|---|---|
(empty) |
|
Tip
|
Enable Add the following line to
Refer to Logging. |
getMemoryUsage
Method
Caution
|
FIXME |
closeAndGetSpills
Method
Caution
|
FIXME |
insertRecord
Method
Caution
|
FIXME |
freeMemory
Method
Caution
|
FIXME |
getPeakMemoryUsedBytes
Method
Caution
|
FIXME |
writeSortedFile
Method
Caution
|
FIXME |
cleanupResources
Method
Caution
|
FIXME |
Creating ShuffleExternalSorter Instance
ShuffleExternalSorter
takes the following when created:
-
memoryManager
— TaskMemoryManager -
blockManager
— BlockManager -
taskContext
— TaskContext -
initialSize
-
numPartitions
-
writeMetrics
— ShuffleWriteMetrics
ShuffleExternalSorter
initializes itself as a MemoryConsumer (with pageSize
as the minimum of PackedRecordPointer.MAXIMUM_PAGE_SIZE_BYTES
and pageSizeBytes, and Tungsten memory mode).
ShuffleExternalSorter
uses spark.shuffle.file.buffer (for fileBufferSizeBytes
) and spark.shuffle.spill.numElementsForceSpillThreshold
(for numElementsForSpillThreshold
) Spark properties.
ShuffleExternalSorter
creates a ShuffleInMemorySorter (with spark.shuffle.sort.useRadixSort
Spark property enabled by default).
ShuffleExternalSorter
initializes the internal registries and counters.
Note
|
ShuffleExternalSorter is created when UnsafeShuffleWriter is open (which is when UnsafeShuffleWriter is created).
|
Freeing Execution Memory by Spilling To Disk — spill
Method
1 2 3 4 5 6 |
long spill(long size, MemoryConsumer trigger) throws IOException |
Note
|
spill is part of MemoryConsumer contract to sort and spill the current records due to memory pressure.
|
spill
frees execution memory, updates TaskMetrics
, and in the end returns the spill size.
Note
|
spill returns 0 when ShuffleExternalSorter has no ShuffleInMemorySorter or the ShuffleInMemorySorter manages no records.
|
You should see the following INFO message in the logs:
1 2 3 4 5 |
INFO Thread [id] spilling sort data of [memoryUsage] to disk ([size] times so far) |
spill
writes sorted file (with isLastFile
disabled).
spill
frees memory and records the spill size.
spill
resets the internal ShuffleInMemorySorter
(that in turn frees up the underlying in-memory pointer array).
spill
returns the spill size.