Driver-spark技术分享

Driver

A Spark driver (aka an application’s driver process) is a JVM process that hosts SparkContext for a Spark application. It is the master node in a Spark application.

It is the cockpit of jobs and tasks execution (using DAGScheduler and Task Scheduler). It hosts Web UI for the environment.

Figure 1. Driver with the services

It splits a Spark application into tasks and schedules them to run on executors.

A driver is where the task scheduler lives and spawns tasks across workers.

A driver coordinates workers and overall execution of tasks.

Note	Spark shell is a Spark application and the driver. It creates a `SparkContext` that is available as `sc`.

Driver requires the additional services (beside the common ones like ShuffleManager, MemoryManager, BlockTransferService, BroadcastManager, CacheManager):

Listener Bus
RPC Environment
MapOutputTrackerMaster with the name MapOutputTracker
BlockManagerMaster with the name BlockManagerMaster
HttpFileServer
MetricsSystem with the name driver
OutputCommitCoordinator with the endpoint’s name OutputCommitCoordinator

Caution

FIXME Diagram of RpcEnv for a driver (and later executors). Perhaps it should be in the notes about RpcEnv?

High-level control flow of work
Your Spark application runs as long as the Spark driver.
- Once the driver terminates, so does your Spark application.
Creates SparkContext, `RDD’s, and executes transformations and actions
Launches tasks

Driver’s Memory

It can be set first using spark-submit’s --driver-memory command-line option or spark.driver.memory and falls back to SPARK_DRIVER_MEMORY if not set earlier.

Note	It is printed out to the standard error output in spark-submit’s verbose mode.

Driver’s Cores

It can be set first using spark-submit’s --driver-cores command-line option for cluster deploy mode.

Note	In `client` deploy mode the driver’s memory corresponds to the memory of the JVM process the Spark application runs on.

Note	It is printed out to the standard error output in spark-submit’s verbose mode.

Settings

Table 1. Spark Properties
Spark Property	Default Value	Description
`spark.driver.blockManager.port`	spark.blockManager.port	Port to use for the BlockManager on the driver. More precisely, `spark.driver.blockManager.port` is used when `NettyBlockTransferService` is created (while `SparkEnv` is created for the driver).
`spark.driver.host`	localHostName	The address of the node where the driver runs on. Set when `SparkContext` is created
`spark.driver.port`	`0`	The port the driver listens to. It is first set to `0` in the driver when SparkContext is initialized. Set to the port of RpcEnv of the driver (in SparkEnv.create) or when client-mode `ApplicationMaster` connected to the driver (in Spark on YARN).
`spark.driver.memory`	`1g`	The driver’s memory size (in MiBs). Refer to Driver’s Memory.
`spark.driver.cores`	`1`	The number of CPU cores assigned to the driver in cluster deploy mode. NOTE: When Client is created (for Spark on YARN in cluster mode only), it sets the number of cores for `ApplicationManager` using `spark.driver.cores`. Refer to Driver’s Cores.
`spark.driver.extraLibraryPath`
`spark.driver.extraJavaOptions`		Additional JVM options for the driver.
spark.driver.appUIAddress `spark.driver.appUIAddress` is used exclusively in Spark on YARN. It is set when YarnClientSchedulerBackend starts to run ExecutorLauncher (and register ApplicationMaster for the Spark application).	`spark.driver.libraryPath`

spark.driver.extraClassPath

spark.driver.extraClassPath system property sets the additional classpath entries (e.g. jars and directories) that should be added to the driver’s classpath in cluster deploy mode.

Note

For client deploy mode you can use a properties file or command line to set spark.driver.extraClassPath.

Do not use SparkConf since it is too late for client deploy mode given the JVM has already been set up to start a Spark application.

Refer to buildSparkSubmitCommand Internal Method for the very low-level details of how it is handled internally.

spark.driver.extraClassPath uses a OS-specific path separator.

Note	Use `spark-submit`‘s `--driver-class-path` command-line option on command line to override `spark.driver.extraClassPath` from a Spark properties file.

Driver

Driver

Driver’s Memory

Driver’s Cores

Settings

spark.driver.extraClassPath

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部