关注 spark技术分享,
撸spark源码 玩spark最佳实践

HDFSMetadataLog — MetadataLog with Hadoop HDFS for Reliable Storage

HDFSMetadataLog — MetadataLog with Hadoop HDFS for Reliable Storage

HDFSMetadataLog is a MetadataLog that uses Hadoop HDFS for a reliable storage.

Note
HDFSMetadataLog uses path (specified when created) that is created automatically unless exists already.

HDFSMetadataLog is created when:

HDFSMetadataLog is further customized to…​FIXME

Table 1. HDFSMetadataLog’s Available Implementations
HDFSMetadataLog Description

BatchCommitLog

CompactibleFileStreamLog

OffsetSeqLog

Table 2. HDFSMetadataLog’s Internal Registries and Counters
Name Description

fileManager

FileManager that…​FIXME

batchFilesFilter

Filter of batch files

metadataPath

The path to metadata directory

Writing Metadata in Serialized Format — serialize Method

Caution
FIXME

deserialize Method

Caution
FIXME

createFileManager Internal Method

Caution
FIXME
Note
createFileManager is used exclusively when HDFSMetadataLog is created (and the internal FileManager is created alongside).

Retrieving Metadata By Batch Id — get Method

Note
get is part of the MetadataLog Contract to…​FIXME.

get…​FIXME

add Method

Caution
FIXME

Retrieving Latest Committed Batch Id with Metadata If Available — getLatest Method

Note
getLatest is a part of MetadataLog Contract to retrieve the recently-committed batch id and the corresponding metadata if available in the metadata storage.

getLatest requests the internal FileManager for the files in metadata directory that match batch file filter.

getLatest takes the batch ids (the batch files correspond to) and sorts the ids in reverse order.

getLatest gives the first batch id with the metadata which could be found in the metadata storage.

Note
It is possible that the batch id could be in the metadata storage, but not available for retrieval.

Creating HDFSMetadataLog Instance

HDFSMetadataLog takes the following when created:

  • SparkSession

  • Path of the metadata log directory

HDFSMetadataLog initializes the internal registries and counters.

HDFSMetadataLog creates the path unless exists already.

赞(0) 打赏
未经允许不得转载:spark技术分享 » HDFSMetadataLog — MetadataLog with Hadoop HDFS for Reliable Storage
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏