关注 spark技术分享,
撸spark源码 玩spark最佳实践

EventLoggingListener — Spark Listener for Persisting Events

EventLoggingListener — Spark Listener for Persisting Events

EventLoggingListener is a SparkListener that persists JSON-encoded events to a file.

When event logging is enabled, EventLoggingListener writes events to a log file under spark.eventLog.dir directory. All Spark events are logged (except SparkListenerBlockUpdated and SparkListenerExecutorMetricsUpdate).

Tip
Use Spark History Server to view the event logs in a browser.

Events can optionally be compressed.

In-flight log files are with .inprogress extension.

EventLoggingListener is a private[spark] class in org.apache.spark.scheduler package.

Tip

Enable INFO logging level for org.apache.spark.scheduler.EventLoggingListener logger to see what happens inside EventLoggingListener.

Add the following line to conf/log4j.properties:

Refer to Logging.

Creating EventLoggingListener Instance

EventLoggingListener requires an application id (appId), the application’s optional attempt id (appAttemptId), logBaseDir, a SparkConf (as sparkConf) and Hadoop’s Configuration (as hadoopConf).

Note
When initialized with no Hadoop’s Configuration it calls SparkHadoopUtil.get.newConfiguration(sparkConf).

Starting EventLoggingListener — start method

start checks whether logBaseDir is really a directory, and if it is not, it throws a IllegalArgumentException with the following message:

The log file’s working name is created based on appId with or without the compression codec used and appAttemptId, i.e. local-1461696754069. It also uses .inprogress extension.

If overwrite is enabled, you should see the WARN message:

The working log .inprogress is attempted to be deleted. In case it could not be deleted, the following WARN message is printed out to the logs:

The buffered output stream is created with metadata with Spark’s version and SparkListenerLogStart class’ name as the first line.

At this point, EventLoggingListener is ready for event logging and you should see the following INFO message in the logs:

Note
start is executed while SparkContext is created.

Logging Event as JSON — logEvent method

logEvent logs event as JSON.

Caution
FIXME

Stopping EventLoggingListener — stop method

stop closes PrintWriter for the log file and renames the file to be without .inprogress extension.

If the target log file exists (one without .inprogress extension), it overwrites the file if spark.eventLog.overwrite is enabled. You should see the following WARN message in the logs:

If the target log file exists and overwrite is disabled, an java.io.IOException is thrown with the following message:

Note
stop is executed while SparkContext is stopped.

Compressing Logged Events

If event compression is enabled, events are compressed using CompressionCodec.

Tip
Refer to CompressionCodec to learn about the available compression codecs.

Settings

Table 1. Spark Properties
Spark Property Default Value Description

spark.eventLog.enabled

false

Enables (true) or disables (false) persisting Spark events.

spark.eventLog.dir

/tmp/spark-events

Directory where events are logged, e.g. hdfs://namenode:8021/directory.

The directory must exist before Spark starts up.

spark.eventLog.buffer.kb

100

Size of the buffer to use when writing to output streams.

spark.eventLog.overwrite

false

Enables (true) or disables (false) deleting (or at least overwriting) an existing .inprogress log file.

spark.eventLog.compress

false

Enables (true) or disables (false) event compression.

spark.eventLog.testing

false

Internal flag for testing purposes that enables adding JSON events to the internal loggedEvents array.

赞(0) 打赏
未经允许不得转载:spark技术分享 » EventLoggingListener — Spark Listener for Persisting Events
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏