HadoopFileLinesReader
HadoopFileLinesReader
is a Scala Iterator of Apache Hadoop’s org.apache.hadoop.io.Text.
HadoopFileLinesReader
is created to access datasets in the following data sources:
-
SimpleTextSource
-
LibSVMFileFormat
-
TextInputCSVDataSource
-
TextInputJsonDataSource
HadoopFileLinesReader
uses the internal iterator that handles accessing files using Hadoop’s FileSystem API.
iterator
Internal Property
1 2 3 4 5 |
iterator: RecordReaderIterator[Text] |
When created, HadoopFileLinesReader
creates an internal iterator
that uses Hadoop’s org.apache.hadoop.mapreduce.lib.input.FileSplit with Hadoop’s org.apache.hadoop.fs.Path and file.
iterator
creates Hadoop’s TaskAttemptID
, TaskAttemptContextImpl
and LineRecordReader
.
iterator
initializes LineRecordReader
and passes it on to RecordReaderIterator
.
Note
|
iterator is used for Iterator -specific methods, i.e. hasNext , next and close .
|