TextFileFormat
TextFileFormat is a TextBasedFileFormat for text format.
|
1 2 3 4 5 6 7 8 |
spark.read.format("text").load("text-datasets") // or the same as above using a shortcut spark.read.text("text-datasets") |
TextFileFormat uses text options while loading a dataset.
| Option | Default Value | Description |
|---|---|---|
|
Compression codec that can be either one of the known aliases or a fully-qualified class name. |
||
|
|
Enables loading a file as a single row (i.e. not splitting by “\n”) |
prepareWrite Method
|
1 2 3 4 5 6 7 8 9 |
prepareWrite( sparkSession: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory |
|
Note
|
prepareWrite is part of FileFormat Contract that is used when FileFormatWriter is requested to write the result of a structured query.
|
prepareWrite…FIXME
Building Partitioned Data Reader — buildReader Method
|
1 2 3 4 5 6 7 8 9 10 11 12 |
buildReader( sparkSession: SparkSession, dataSchema: StructType, partitionSchema: StructType, requiredSchema: StructType, filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] |
|
Note
|
buildReader is part of FileFormat Contract to…FIXME
|
buildReader…FIXME
readToUnsafeMem Internal Method
|
1 2 3 4 5 6 7 8 |
readToUnsafeMem( conf: Broadcast[SerializableConfiguration], requiredSchema: StructType, wholeTextMode: Boolean): (PartitionedFile) => Iterator[UnsafeRow] |
readToUnsafeMem…FIXME
|
Note
|
readToUnsafeMem is used exclusively when TextFileFormat is requested to buildReader
|
spark技术分享