YarnClusterSchedulerBackend – SchedulerBackend for YARN in Cluster Deploy Mode
YarnClusterSchedulerBackend
is a custom YarnSchedulerBackend for Spark on YARN in cluster deploy mode.
This is a scheduler backend that supports multiple application attempts and URLs for driver’s logs to display as links in the web UI in the Executors tab for the driver.
It uses spark.yarn.app.attemptId
under the covers (that the YARN resource manager sets?).
Note
|
YarnClusterSchedulerBackend is a private[spark] Scala class. You can find the sources in org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.
|
Tip
|
Enable Add the following line to
Refer to Logging. |
Creating YarnClusterSchedulerBackend
Creating a YarnClusterSchedulerBackend
object requires a TaskSchedulerImpl and SparkContext objects.
Starting YarnClusterSchedulerBackend (start method)
YarnClusterSchedulerBackend
comes with a custom start
method.
Note
|
start is part of the SchedulerBackend Contract.
|
Internally, it first queries ApplicationMaster for attemptId and records the application and attempt ids.
It then calls the parent’s start and sets the parent’s totalExpectedExecutors to the initial number of executors.
Calculating Driver Log URLs (getDriverLogUrls method)
getDriverLogUrls
in YarnClusterSchedulerBackend
calculates the URLs for the driver’s logs – standard output (stdout) and standard error (stderr).
Note
|
getDriverLogUrls is part of the SchedulerBackend Contract.
|
Internally, it retrieves the container id and through environment variables computes the base URL.
You should see the following DEBUG in the logs:
1 2 3 4 5 |
DEBUG Base URL for logs: [baseUrl] |