TextFileFormat
TextFileFormat
is a TextBasedFileFormat for text format.
1 2 3 4 5 6 7 8 |
spark.read.format("text").load("text-datasets") // or the same as above using a shortcut spark.read.text("text-datasets") |
TextFileFormat
uses text options while loading a dataset.
Option | Default Value | Description |
---|---|---|
Compression codec that can be either one of the known aliases or a fully-qualified class name. |
||
|
Enables loading a file as a single row (i.e. not splitting by “\n”) |
prepareWrite
Method
1 2 3 4 5 6 7 8 9 |
prepareWrite( sparkSession: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory |
Note
|
prepareWrite is part of FileFormat Contract that is used when FileFormatWriter is requested to write the result of a structured query.
|
prepareWrite
…FIXME
Building Partitioned Data Reader — buildReader
Method
1 2 3 4 5 6 7 8 9 10 11 12 |
buildReader( sparkSession: SparkSession, dataSchema: StructType, partitionSchema: StructType, requiredSchema: StructType, filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] |
Note
|
buildReader is part of FileFormat Contract to…FIXME
|
buildReader
…FIXME
readToUnsafeMem
Internal Method
1 2 3 4 5 6 7 8 |
readToUnsafeMem( conf: Broadcast[SerializableConfiguration], requiredSchema: StructType, wholeTextMode: Boolean): (PartitionedFile) => Iterator[UnsafeRow] |
readToUnsafeMem
…FIXME
Note
|
readToUnsafeMem is used exclusively when TextFileFormat is requested to buildReader
|