Settings
The following settings (aka system properties) are specific to Spark on YARN.
Spark Property | Default Value | Description |
---|---|---|
|
Port that |
|
|
In milliseconds unless the unit is specified. |
|
10% of spark.executor.memory but not less than |
(in MiBs) is an optional setting for the executor memory overhead (in addition to spark.executor.memory) when requesting YARN resource containers from a YARN cluster. |
spark.yarn.credentials.renewalTime
spark.yarn.credentials.renewalTime
(default: Long.MaxValue
ms) is an internal setting for the time of the next credentials renewal.
spark.yarn.credentials.updateTime
spark.yarn.credentials.updateTime
(default: Long.MaxValue
ms) is an internal setting for the time of the next credentials update.
spark.yarn.scheduler.initial-allocation.interval
spark.yarn.scheduler.initial-allocation.interval
(default: 200ms
) controls the initial allocation interval.
It is used when ApplicationMaster
is instantiated.
spark.yarn.scheduler.heartbeat.interval-ms
spark.yarn.scheduler.heartbeat.interval-ms
(default: 3s
) is the heartbeat interval to YARN ResourceManager.
It is used when ApplicationMaster
is instantiated.
spark.yarn.max.executor.failures
spark.yarn.max.executor.failures
is an optional setting that sets the maximum number of executor failures before…TK
It is used when ApplicationMaster
is instantiated.
Caution
|
FIXME |
spark.yarn.maxAppAttempts
spark.yarn.maxAppAttempts
is the maximum number of attempts to register ApplicationMaster before deploying a Spark application to YARN is deemed failed.
It is used when YarnRMClient
computes getMaxRegAttempts
.
spark.yarn.user.classpath.first
Caution
|
FIXME |
spark.yarn.archive
spark.yarn.archive
is the location of the archive containing jars files with Spark classes. It cannot be a local:
URI.
It is used to populate CLASSPATH for ApplicationMaster
and executors.
spark.yarn.queue
spark.yarn.queue
(default: default
) is the name of the YARN resource queue that Client
uses to submit a Spark application to.
You can specify the value using spark-submit’s --queue
command-line argument.
The value is used to set YARN’s ApplicationSubmissionContext.setQueue.
spark.yarn.jars
spark.yarn.jars
is the location of the Spark jars.
1 2 3 4 5 |
--conf spark.yarn.jar=hdfs://master:8020/spark/spark-assembly-2.0.0-hadoop2.7.2.jar |
It is used to populate the CLASSPATH for ApplicationMaster
and ExecutorRunnables
(when spark.yarn.archive is not defined).
Note
|
spark.yarn.jar setting is deprecated as of Spark 2.0.
|
spark.yarn.report.interval
spark.yarn.report.interval
(default: 1s
) is the interval (in milliseconds) between reports of the current application status.
It is used in Client.monitorApplication.
spark.yarn.dist.jars
spark.yarn.dist.jars
(default: empty) is a collection of additional jars to distribute.
It is used when Client distributes additional resources as specified using --jars
command-line option for spark-submit.
spark.yarn.dist.files
spark.yarn.dist.files
(default: empty) is a collection of additional files to distribute.
It is used when Client distributes additional resources as specified using --files
command-line option for spark-submit.
spark.yarn.dist.archives
spark.yarn.dist.archives
(default: empty) is a collection of additional archives to distribute.
It is used when Client distributes additional resources as specified using --archives
command-line option for spark-submit.
spark.yarn.principal
spark.yarn.principal
— See the corresponding –principal command-line option for spark-submit.
spark.yarn.keytab
spark.yarn.keytab
— See the corresponding –keytab command-line option for spark-submit.
spark.yarn.submit.file.replication
spark.yarn.submit.file.replication
is the replication factor (number) for files uploaded by Spark to HDFS.
spark.yarn.config.gatewayPath
spark.yarn.config.gatewayPath
(default: null
) is the root of configuration paths that is present on gateway nodes, and will be replaced with the corresponding path in cluster machines.
It is used when Client
resolves a path to be YARN NodeManager-aware.
spark.yarn.config.replacementPath
spark.yarn.config.replacementPath
(default: null
) is the path to use as a replacement for spark.yarn.config.gatewayPath when launching processes in the YARN cluster.
It is used when Client
resolves a path to be YARN NodeManager-aware.
spark.yarn.historyServer.address
spark.yarn.historyServer.address
is the optional address of the History Server.
spark.yarn.access.namenodes
spark.yarn.access.namenodes
(default: empty) is a list of extra NameNode URLs for which to request delegation tokens. The NameNode that hosts fs.defaultFS does not need to be listed here.
spark.yarn.executor.nodeLabelExpression
spark.yarn.executor.nodeLabelExpression
is a node label expression for executors.
spark.yarn.containerLauncherMaxThreads
spark.yarn.containerLauncherMaxThreads
(default: 25
)…FIXME
spark.yarn.executor.failuresValidityInterval
spark.yarn.executor.failuresValidityInterval
(default: -1L
) is an interval (in milliseconds) after which Executor failures will be considered independent and not accumulate towards the attempt count.
spark.yarn.submit.waitAppCompletion
spark.yarn.submit.waitAppCompletion
(default: true
) is a flag to control whether to wait for the application to finish before exiting the launcher process in cluster mode.
spark.yarn.am.cores
spark.yarn.am.cores
(default: 1
) sets the number of CPU cores for ApplicationMaster’s JVM.
spark.yarn.am.memory
spark.yarn.am.memory
(default: 512m
) sets the memory size of ApplicationMaster’s JVM (in MiBs)
spark.yarn.stagingDir
spark.yarn.stagingDir
is a staging directory used while submitting applications.
spark.yarn.preserve.staging.files
spark.yarn.preserve.staging.files
(default: false
) controls whether to preserve temporary files in a staging directory (as pointed by spark.yarn.stagingDir).
spark.yarn.launchContainers
spark.yarn.launchContainers
(default: true
) is a flag used for testing only so YarnAllocator
does not run launch ExecutorRunnables
on allocated YARN containers.