Spark Standalone – Using ZooKeeper for High-Availability of Master
Tip
|
Read Recovery Mode to know the theory. |
You’re going to start two standalone Masters.
You’ll need 4 terminals (adjust addresses as needed):
Start ZooKeeper.
Create a configuration file ha.conf
with the content as follows:
1 2 3 4 5 6 7 |
spark.deploy.recoveryMode=ZOOKEEPER spark.deploy.zookeeper.url=<zookeeper_host>:2181 spark.deploy.zookeeper.dir=/spark |
Start the first standalone Master.
1 2 3 4 5 |
./sbin/start-master.sh -h localhost -p 7077 --webui-port 8080 --properties-file ha.conf |
Start the second standalone Master.
Note
|
It is not possible to start another instance of standalone Master on the same machine using ./sbin/start-master.sh . The reason is that the script assumes one instance per machine only. We’re going to change the script to make it possible.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
$ cp ./sbin/start-master{,-2}.sh $ grep "CLASS 1" ./sbin/start-master-2.sh "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \ $ sed -i -e 's/CLASS 1/CLASS 2/' sbin/start-master-2.sh $ grep "CLASS 1" ./sbin/start-master-2.sh $ grep "CLASS 2" ./sbin/start-master-2.sh "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 2 \ $ ./sbin/start-master-2.sh -h localhost -p 17077 --webui-port 18080 --properties-file ha.conf |
You can check how many instances you’re currently running using jps
command as follows:
1 2 3 4 5 6 7 8 9 |
$ jps -lm 5024 sun.tools.jps.Jps -lm 4994 org.apache.spark.deploy.master.Master --ip japila.local --port 7077 --webui-port 8080 -h localhost -p 17077 --webui-port 18080 --properties-file ha.conf 4808 org.apache.spark.deploy.master.Master --ip japila.local --port 7077 --webui-port 8080 -h localhost -p 7077 --webui-port 8080 --properties-file ha.conf 4778 org.apache.zookeeper.server.quorum.QuorumPeerMain config/zookeeper.properties |
Start a standalone Worker.
1 2 3 4 5 |
./sbin/start-slave.sh spark://localhost:7077,localhost:17077 |
Start Spark shell.
1 2 3 4 5 |
./bin/spark-shell --master spark://localhost:7077,localhost:17077 |
Wait till the Spark shell connects to an active standalone Master.
Find out which standalone Master is active (there can only be one). Kill it. Observe how the other standalone Master takes over and lets the Spark shell register with itself. Check out the master’s UI.
Optionally, kill the worker, make sure it goes away instantly in the active master’s logs.