关注 spark技术分享,
撸spark源码 玩spark最佳实践

YarnClusterSchedulerBackend

YarnClusterSchedulerBackend – SchedulerBackend for YARN in Cluster Deploy Mode

YarnClusterSchedulerBackend is a custom YarnSchedulerBackend for Spark on YARN in cluster deploy mode.

This is a scheduler backend that supports multiple application attempts and URLs for driver’s logs to display as links in the web UI in the Executors tab for the driver.

It uses spark.yarn.app.attemptId under the covers (that the YARN resource manager sets?).

Note
YarnClusterSchedulerBackend is a private[spark] Scala class. You can find the sources in org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.
Tip

Enable DEBUG logging level for org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend logger to see what happens inside.

Add the following line to conf/log4j.properties:

Refer to Logging.

Creating YarnClusterSchedulerBackend

Creating a YarnClusterSchedulerBackend object requires a TaskSchedulerImpl and SparkContext objects.

Starting YarnClusterSchedulerBackend (start method)

YarnClusterSchedulerBackend comes with a custom start method.

Note
start is part of the SchedulerBackend Contract.

It then calls the parent’s start and sets the parent’s totalExpectedExecutors to the initial number of executors.

Calculating Driver Log URLs (getDriverLogUrls method)

getDriverLogUrls in YarnClusterSchedulerBackend calculates the URLs for the driver’s logs – standard output (stdout) and standard error (stderr).

Note
getDriverLogUrls is part of the SchedulerBackend Contract.

Internally, it retrieves the container id and through environment variables computes the base URL.

You should see the following DEBUG in the logs:

赞(0) 打赏
未经允许不得转载:spark技术分享 » YarnClusterSchedulerBackend
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏