TransportConf — Transport Configuration
TransportConf
is a class for the transport-related network configuration for modules, e.g. ExternalShuffleService or YarnShuffleService.
It exposes methods to access settings for a single module as spark.module.prefix or general network-related settings.
Creating TransportConf from SparkConf — fromSparkConf
Method
1 2 3 4 5 |
fromSparkConf(_conf: SparkConf, module: String, numUsableCores: Int = 0): TransportConf |
Note
|
fromSparkConf belongs to SparkTransportConf object.
|
fromSparkConf
creates a TransportConf for module
from the given SparkConf.
Internally, fromSparkConf
calculates the default number of threads for both the Netty client and server thread pools.
fromSparkConf
uses spark.[module].io.serverThreads
and spark.[module].io.clientThreads
if specified for the number of threads to use. If not defined, fromSparkConf
sets them to the default number of threads calculated earlier.
Calculating Default Number of Threads (8 Maximum) — defaultNumThreads
Internal Method
1 2 3 4 5 |
defaultNumThreads(numUsableCores: Int): Int |
Note
|
defaultNumThreads belongs to SparkTransportConf object.
|
defaultNumThreads
calculates the default number of threads for both the Netty client and server thread pools that is 8 maximum or numUsableCores
is smaller. If numUsableCores
is not specified, defaultNumThreads
uses the number of processors available to the Java virtual machine.
Note
|
8 is the maximum number of threads for Netty and is not configurable. |
Note
|
defaultNumThreads uses Java’s Runtime for the number of processors in JVM.
|
spark.module.prefix Settings
The settings can be in the form of spark.[module].[prefix] with the following prefixes:
-
io.mode
(default:NIO
) — the IO mode:nio
orepoll
. -
io.preferDirectBufs
(default:true
) — a flag to control whether Spark prefers allocating off-heap byte buffers within Netty (true
) or not (false
). -
io.connectionTimeout
(default: spark.network.timeout or120s
) — the connection timeout in milliseconds. -
io.backLog
(default:-1
for no backlog) — the requested maximum length of the queue of incoming connections. -
io.numConnectionsPerPeer
(default:1
) — the number of concurrent connections between two nodes for fetching data. -
io.serverThreads
(default:0
i.e. 2x#cores) — the number of threads used in the server thread pool. -
io.clientThreads
(default:0
i.e. 2x#cores) — the number of threads used in the client thread pool. -
io.receiveBuffer
(default:-1
) — the receive buffer size (SO_RCVBUF). -
io.sendBuffer
(default:-1
) — the send buffer size (SO_SNDBUF). -
sasl.timeout
(default:30s
) — the timeout (in milliseconds) for a single round trip of SASL token exchange. -
io.maxRetries
(default:3
) — the maximum number of times Spark will try IO exceptions (such as connection timeouts) per request. If set to0
, Spark will not do any retries. -
io.retryWait
(default:5s
) — the time (in milliseconds) that Spark will wait in order to perform a retry after anIOException
. Only relevant ifio.maxRetries
> 0. -
io.lazyFD
(default:true
) — controls whether to initializeFileDescriptor
lazily (true
) or not (false
). Iftrue
, file descriptors are created only when data is going to be transferred. This can reduce the number of open files.
General Network-Related Settings
spark.storage.memoryMapThreshold
spark.storage.memoryMapThreshold
(default: 2m
) is the minimum size of a block that we should start using memory map rather than reading in through normal IO operations.
This prevents Spark from memory mapping very small blocks. In general, memory mapping has high overhead for blocks close to or below the page size of the OS.