HDFSMetadataLog — MetadataLog with Hadoop HDFS for Reliable Storage
HDFSMetadataLog
is a MetadataLog that uses Hadoop HDFS for a reliable storage.
Note
|
HDFSMetadataLog uses path (specified when created) that is created automatically unless exists already.
|
HDFSMetadataLog
is created when:
-
KafkaSource
is first requested for initial partition offsets (from the metadata storage) -
RateStreamSource
is created (and looks up startTimeMs in the metadata storage)
HDFSMetadataLog
is further customized to…FIXME
HDFSMetadataLog | Description |
---|---|
Name | Description |
---|---|
|
|
Filter of batch files |
|
The path to metadata directory |
createFileManager
Internal Method
1 2 3 4 5 |
createFileManager(): FileManager |
Caution
|
FIXME |
Note
|
createFileManager is used exclusively when HDFSMetadataLog is created (and the internal FileManager is created alongside).
|
Retrieving Metadata By Batch Id — get
Method
1 2 3 4 5 |
get(batchId: Long): Option[T] |
Note
|
get is part of the MetadataLog Contract to…FIXME.
|
get
…FIXME
Retrieving Latest Committed Batch Id with Metadata If Available — getLatest
Method
1 2 3 4 5 |
getLatest(): Option[(Long, T)] |
Note
|
getLatest is a part of MetadataLog Contract to retrieve the recently-committed batch id and the corresponding metadata if available in the metadata storage.
|
getLatest
requests the internal FileManager for the files in metadata directory that match batch file filter.
getLatest
takes the batch ids (the batch files correspond to) and sorts the ids in reverse order.
getLatest
gives the first batch id with the metadata which could be found in the metadata storage.
Note
|
It is possible that the batch id could be in the metadata storage, but not available for retrieval. |
Creating HDFSMetadataLog Instance
HDFSMetadataLog
takes the following when created:
HDFSMetadataLog
initializes the internal registries and counters.
HDFSMetadataLog
creates the path unless exists already.