关注 spark技术分享,
撸spark源码 玩spark最佳实践

Example 2-workers-on-1-node Standalone Cluster (one executor per worker)

Example 2-workers-on-1-node Standalone Cluster (one executor per worker)

The following steps are a recipe for a Spark Standalone cluster with 2 workers on a single machine.

The aim is to have a complete Spark-clustered environment at your laptop.

Important

You can use the Spark Standalone cluster in the following ways:

For our learning purposes, MASTER_URL is spark://localhost:7077.

  1. Start a standalone master server.

    Notes:

    • Read Operating Spark Standalone master

    • Use SPARK_CONF_DIR for the configuration directory (defaults to $SPARK_HOME/conf).

    • Use spark.deploy.retainedApplications (default: 200)

    • Use spark.deploy.retainedDrivers (default: 200)

    • Use spark.deploy.recoveryMode (default: NONE)

    • Use spark.deploy.defaultCores (default: Int.MaxValue)

  2. Open master’s web UI at http://localhost:8080 to know the current setup – no workers and applications.

    spark standalone console master only.png
    Figure 1. Master’s web UI with no workers and applications
  3. Start the first worker.

    Note
    The command above in turn executes org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://japila.local:7077
  4. Check out master’s web UI at http://localhost:8080 to know the current setup – one worker.

    spark standalone console one worker.png
    Figure 2. Master’s web UI with one worker ALIVE

    Note the number of CPUs and memory, 8 and 15 GBs, respectively (one gigabyte left for the OS — oh, how generous, my dear Spark!).

  5. Let’s stop the worker to start over with custom configuration. You use ./sbin/stop-slave.sh to stop the worker.

  6. Check out master’s web UI at http://localhost:8080 to know the current setup – one worker in DEAD state.

    spark standalone console worker dead.png
    Figure 3. Master’s web UI with one worker DEAD
  7. Start a worker using --cores 2 and --memory 4g for two CPU cores and 4 GB of RAM.

    Note
    The command translates to org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://japila.local:7077 --cores 2 --memory 4g
  8. Check out master’s web UI at http://localhost:8080 to know the current setup – one worker ALIVE and another DEAD.

    spark standalone console workers alive and dead.png
    Figure 4. Master’s web UI with one worker ALIVE and one DEAD
  9. Configuring cluster using conf/spark-env.sh

    There’s the conf/spark-env.sh.template template to start from.

    We’re going to use the following conf/spark-env.sh:

    1. the number of cores per worker

    2. the number of workers per node (a machine)

  10. Start the workers.

    As the command progresses, it prints out starting org.apache.spark.deploy.worker.Worker, logging to for each worker. You defined two workers in conf/spark-env.sh using SPARK_WORKER_INSTANCES, so you should see two lines.

  11. Check out master’s web UI at http://localhost:8080 to know the current setup – at least two workers should be ALIVE.

    spark standalone console two workers alive.png
    Figure 5. Master’s web UI with two workers ALIVE
    Note

    Use jps on master to see the instances given they all run on the same machine, e.g. localhost).

  12. Stop all instances – the driver and the workers.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Example 2-workers-on-1-node Standalone Cluster (one executor per worker)
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏