Management Scripts for Standalone Workers
sbin/start-slave.sh script starts a Spark worker (aka slave) on the machine the script is executed on. The script launches SPARK_WORKER_INSTANCES workers (which defaults to 1).
|
1 2 3 4 5 |
./sbin/start-slave.sh [masterURL] |
The mandatory masterURL parameter is of the form spark://hostname:port, e.g. spark://localhost:7077. It is also possible to specify a comma-separated master URLs of the form spark://hostname1:port1,hostname2:port2,… with each element to be hostname:port.
Internally, the script starts sparkWorker RPC environment.
The order of importance of Spark configuration settings is as follows (from least to the most important):
| Name | Default | Description |
|---|---|---|
|
|
The number of worker instances to run on a node |
|
|
The base port number to listen on for the first worker. If set, subsequent workers will increment this number. If unset, Spark will pick a random port. |
||
|
|
The base port for the web UI of the first worker. Subsequent workers will increment this number. If the port is used, the successive ports are tried until a free one is found. |
|
|
The number of cores to use by a single executor |
||
|
|
The amount of memory to use, e.g. |
|
|
|
The working directory of worker processes, i.e. web UI, the executors and drivers of Spark applications, that includes both logs and scratch space |
The script uses the following helper scripts:
-
sbin/spark-config.sh -
bin/load-spark-env.sh
Command-line Options
You can use the following command-line options:
-
--hostor-hsets the hostname to be available under. -
--portor-p– command-line version of SPARK_WORKER_PORT environment variable. -
--coresor-c(default: the number of processors available to the JVM) – command-line version of SPARK_WORKER_CORES environment variable. -
--memoryor-m– command-line version of SPARK_WORKER_MEMORY environment variable. -
--work-diror-d– command-line version of SPARK_WORKER_DIR environment variable. -
--webui-port– command-line version of SPARK_WORKER_WEBUI_PORT environment variable. -
--properties-file(default:conf/spark-defaults.conf) – the path to a custom Spark properties file. Refer to spark-defaults.conf. -
--help
Spark properties
After loading the default SparkConf, if --properties-file or SPARK_WORKER_OPTS define spark.worker.ui.port, the value of the property is used as the port of the worker’s web UI.
|
1 2 3 4 5 |
SPARK_WORKER_OPTS=-Dspark.worker.ui.port=21212 ./sbin/start-slave.sh spark://localhost:7077 |
or
|
1 2 3 4 5 6 7 8 |
$ cat worker.properties spark.worker.ui.port=33333 $ ./sbin/start-slave.sh spark://localhost:7077 --properties-file worker.properties |
sbin/spark-daemon.sh
Ultimately, the script calls sbin/spark-daemon.sh start to kick off org.apache.spark.deploy.worker.Worker with --webui-port, --port and the master URL.
Internals of org.apache.spark.deploy.worker.Worker
Upon starting, a Spark worker creates the default SparkConf.
It parses command-line arguments for the worker using WorkerArguments class.
-
SPARK_LOCAL_HOSTNAME– custom host name -
SPARK_LOCAL_IP– custom IP to use (whenSPARK_LOCAL_HOSTNAMEis not set or hostname resolves to incorrect IP)
It starts sparkWorker RPC Environment and waits until the RpcEnv terminates.
RPC environment
The org.apache.spark.deploy.worker.Worker class starts its own sparkWorker RPC environment with Worker endpoint.
sbin/start-slaves.sh script starts slave instances
The ./sbin/start-slaves.sh script starts slave instances on each machine specified in the conf/slaves file.
It has support for starting Tachyon using --with-tachyon command line option. It assumes tachyon/bin/tachyon command be available in Spark’s home directory.
The script uses the following helper scripts:
-
sbin/spark-config.sh -
bin/load-spark-env.sh -
conf/spark-env.sh
The script uses the following environment variables (and sets them when unavailable):
-
SPARK_PREFIX -
SPARK_HOME -
SPARK_CONF_DIR -
SPARK_MASTER_PORT -
SPARK_MASTER_IP
The following command will launch 3 worker instances on each node. Each worker instance will use two cores.
|
1 2 3 4 5 |
SPARK_WORKER_INSTANCES=3 SPARK_WORKER_CORES=2 ./sbin/start-slaves.sh |
spark技术分享