Example 2-workers-on-1-node Standalone Cluster (one executor per worker)-spark技术分享

Example 2-workers-on-1-node Standalone Cluster (one executor per worker)

The following steps are a recipe for a Spark Standalone cluster with 2 workers on a single machine.

The aim is to have a complete Spark-clustered environment at your laptop.

Tip	Consult the following documents: Operating Spark master Starting Spark workers on node using sbin/start-slave.sh

Important

You can use the Spark Standalone cluster in the following ways:

Use spark-shell with --master MASTER_URL
Use SparkConf.setMaster(MASTER_URL) in your Spark application

For our learning purposes, MASTER_URL is spark://localhost:7077.

Start a standalone master server.

./sbin/start-master.sh

1

./sbin/start-master.sh

Notes:
- Read Operating Spark Standalone master
- Use SPARK_CONF_DIR for the configuration directory (defaults to $SPARK_HOME/conf).
- Use spark.deploy.retainedApplications (default: 200)
- Use spark.deploy.retainedDrivers (default: 200)
- Use spark.deploy.recoveryMode (default: NONE)
- Use spark.deploy.defaultCores (default: Int.MaxValue)
Open master’s web UI at http://localhost:8080 to know the current setup – no workers and applications.

Figure 1. Master’s web UI with no workers and applications
Start the first worker.

./sbin/start-slave.sh spark://japila.local:7077

1

./sbin/start-slave.sh spark://japila.local:7077

Note
The command above in turn executes org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://japila.local:7077
Check out master’s web UI at http://localhost:8080 to know the current setup – one worker.

Figure 2. Master’s web UI with one worker ALIVE

Note the number of CPUs and memory, 8 and 15 GBs, respectively (one gigabyte left for the OS — oh, how generous, my dear Spark!).
Let’s stop the worker to start over with custom configuration. You use ./sbin/stop-slave.sh to stop the worker.

./sbin/stop-slave.sh

1

./sbin/stop-slave.sh
Check out master’s web UI at http://localhost:8080 to know the current setup – one worker in DEAD state.

Figure 3. Master’s web UI with one worker DEAD
Start a worker using --cores 2 and --memory 4g for two CPU cores and 4 GB of RAM.

./sbin/start-slave.sh spark://japila.local:7077 --cores 2 --memory 4g

1

./sbin/start-slave.sh spark://japila.local:7077 --cores 2 --memory 4g

Note
The command translates to org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://japila.local:7077 --cores 2 --memory 4g
Check out master’s web UI at http://localhost:8080 to know the current setup – one worker ALIVE and another DEAD.

Figure 4. Master’s web UI with one worker ALIVE and one DEAD
Configuring cluster using conf/spark-env.sh

There’s the conf/spark-env.sh.template template to start from.

We’re going to use the following conf/spark-env.sh:

conf/spark-env.sh SPARK_WORKER_CORES=2 (1) SPARK_WORKER_INSTANCES=2 (2) SPARK_WORKER_MEMORY=2g

1
2
3
4
5
6
7
8

conf/spark-env.sh

SPARK_WORKER_CORES=2 (1)
SPARK_WORKER_INSTANCES=2 (2)
SPARK_WORKER_MEMORY=2g
1. the number of cores per worker
2. the number of workers per node (a machine)

Start the workers.

./sbin/start-slave.sh spark://japila.local:7077

1	./sbin/start-slave.sh spark://japila.local:7077

As the command progresses, it prints out starting org.apache.spark.deploy.worker.Worker, logging to for each worker. You defined two workers in conf/spark-env.sh using SPARK_WORKER_INSTANCES, so you should see two lines.

$ ./sbin/start-slave.sh spark://japila.local:7077
starting org.apache.spark.deploy.worker.Worker, logging to ../logs/spark-jacek-org.apache.spark.deploy.worker.Worker-1-japila.local.out
starting org.apache.spark.deploy.worker.Worker, logging to ../logs/spark-jacek-org.apache.spark.deploy.worker.Worker-2-japila.local.out

$ ./sbin/start-slave.sh spark://japila.local:7077

starting org.apache.spark.deploy.worker.Worker, logging to ../logs/spark-jacek-org.apache.spark.deploy.worker.Worker-1-japila.local.out

starting org.apache.spark.deploy.worker.Worker, logging to ../logs/spark-jacek-org.apache.spark.deploy.worker.Worker-2-japila.local.out

Check out master’s web UI at http://localhost:8080 to know the current setup – at least two workers should be ALIVE.

spark standalone console two workers alive.png

Figure 5. Master’s web UI with two workers ALIVE

Note

Use jps on master to see the instances given they all run on the same machine, e.g. localhost).

$ jps
6580 Worker
4872 Master
6874 Jps
6539 Worker

$ jps

6580 Worker

4872 Master

6874 Jps

6539 Worker

Stop all instances – the driver and the workers.

./sbin/stop-all.sh

1

./sbin/stop-all.sh

Example 2-workers-on-1-node Standalone Cluster (one executor per worker)

Example 2-workers-on-1-node Standalone Cluster (one executor per worker)

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部