HDFSMetadataLog — MetadataLog with Hadoop HDFS for Reliable Storage
HDFSMetadataLog is a MetadataLog that uses Hadoop HDFS for a reliable storage.
|
Note
|
HDFSMetadataLog uses path (specified when created) that is created automatically unless exists already.
|
HDFSMetadataLog is created when:
-
KafkaSourceis first requested for initial partition offsets (from the metadata storage) -
RateStreamSourceis created (and looks up startTimeMs in the metadata storage)
HDFSMetadataLog is further customized to…FIXME
| HDFSMetadataLog | Description |
|---|---|
| Name | Description |
|---|---|
|
|
|
|
Filter of batch files |
|
|
The path to metadata directory |
createFileManager Internal Method
|
1 2 3 4 5 |
createFileManager(): FileManager |
|
Caution
|
FIXME |
|
Note
|
createFileManager is used exclusively when HDFSMetadataLog is created (and the internal FileManager is created alongside).
|
Retrieving Metadata By Batch Id — get Method
|
1 2 3 4 5 |
get(batchId: Long): Option[T] |
|
Note
|
get is part of the MetadataLog Contract to…FIXME.
|
get…FIXME
Retrieving Latest Committed Batch Id with Metadata If Available — getLatest Method
|
1 2 3 4 5 |
getLatest(): Option[(Long, T)] |
|
Note
|
getLatest is a part of MetadataLog Contract to retrieve the recently-committed batch id and the corresponding metadata if available in the metadata storage.
|
getLatest requests the internal FileManager for the files in metadata directory that match batch file filter.
getLatest takes the batch ids (the batch files correspond to) and sorts the ids in reverse order.
getLatest gives the first batch id with the metadata which could be found in the metadata storage.
|
Note
|
It is possible that the batch id could be in the metadata storage, but not available for retrieval. |
Creating HDFSMetadataLog Instance
HDFSMetadataLog takes the following when created:
HDFSMetadataLog initializes the internal registries and counters.
HDFSMetadataLog creates the path unless exists already.
spark技术分享