CSVFileFormat
CSVFileFormat
is a TextBasedFileFormat for csv format (i.e. registers itself to handle files in csv format and converts them to Spark SQL rows).
1 2 3 4 5 6 7 8 |
spark.read.format("csv").load("csv-datasets") // or the same as above using a shortcut spark.read.csv("csv-datasets") |
CSVFileFormat
uses CSV options (that in turn are used to configure the underlying CSV parser from uniVocity-parsers project).
Option | Default Value | Description |
---|---|---|
|
Alias of encoding |
|
|
One character to…FIXME |
|
Compression codec that can be either one of the known aliases or a fully-qualified class name. Alias of compression |
||
|
||
Compression codec that can be either one of the known aliases or a fully-qualified class name. Alias of codec |
||
|
Uses |
|
|
Alias of sep |
|
|
Alias of charset |
|
|
||
|
||
|
||
|
||
|
||
|
||
|
Possible values:
|
|
|
||
|
||
|
||
(empty string) |
||
|
||
|
Alias of delimiter |
|
|
Uses timeZone and |
|
|
||
|
Preparing Write Job — prepareWrite
Method
1 2 3 4 5 6 7 8 9 |
prepareWrite( sparkSession: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory |
Note
|
prepareWrite is part of the FileFormat Contract to prepare a write job.
|
prepareWrite
…FIXME
Building Partitioned Data Reader — buildReader
Method
1 2 3 4 5 6 7 8 9 10 11 12 |
buildReader( sparkSession: SparkSession, dataSchema: StructType, partitionSchema: StructType, requiredSchema: StructType, filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] |
Note
|
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.
|
buildReader
…FIXME