关注 spark技术分享,
撸spark源码 玩spark最佳实践

OffsetSeqLog — HDFSMetadataLog with OffsetSeq Metadata

OffsetSeqLog — HDFSMetadataLog with OffsetSeq Metadata

OffsetSeqLog is a HDFSMetadataLog with metadata as OffsetSeq.

Note
HDFSMetadataLog is a MetadataLog that uses Hadoop HDFS for a reliable storage.

OffsetSeqLog is created exclusively for write-ahead log of offsets in StreamExecution.

OffsetSeqLog uses OffsetSeq for metadata which holds an ordered collection of zero or more offsets and optional metadata (as OffsetSeqMetadata for keeping track of event time watermark as set up by a Spark developer and what was found in the records).

serialize Method

Note
serialize is a part of HDFSMetadataLog Contract to write a metadata in serialized format.

serialize firstly writes out the version prefixed with v on a single line (e.g. v1) followed by the optional metadata in JSON format.

Note
The version in Spark 2.2 is 1 with the charset being UTF-8.

serialize then writes out the offsets in JSON format, one per line.

Note
No offsets to write in offsetSeq for a streaming source is marked as (a dash) in the log.

deserialize Method

Caution
FIXME

Creating OffsetSeqLog Instance

OffsetSeqLog takes the following when created:

  • SparkSession

  • Path of the metadata log directory

赞(0) 打赏
未经允许不得转载:spark技术分享 » OffsetSeqLog — HDFSMetadataLog with OffsetSeq Metadata
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏