JsonFileFormat — Built-In Support for Files in JSON Format
JsonFileFormat is a TextBasedFileFormat for json format (i.e. registers itself to handle files in json format and convert them to Spark SQL rows).
|
1 2 3 4 5 6 7 8 |
spark.read.format("json").load("json-datasets") // or the same as above using a shortcut spark.read.json("json-datasets") |
JsonFileFormat comes with options to further customize JSON parsing.
|
Note
|
JsonFileFormat uses Jackson 2.6.7 as the JSON parser library and some options map directly to Jackson’s internal options (as JsonParser.Feature).
|
| Option | Default Value | Description | ||
|---|---|---|---|---|
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
Compression codec that can be either one of the known aliases or a fully-qualified class name. |
||||
|
|
Date format
|
|||
|
|
Controls whether…FIXME |
|||
|
|
Case insensitive name of the parse mode
|
|||
|
|
||||
|
|
||||
|
|
||||
|
|
Timestamp format
|
|||
|
Java’s |
isSplitable Method
|
1 2 3 4 5 6 7 8 |
isSplitable( sparkSession: SparkSession, options: Map[String, String], path: Path): Boolean |
|
Note
|
isSplitable is part of FileFormat Contract.
|
isSplitable…FIXME
inferSchema Method
|
1 2 3 4 5 6 7 8 |
inferSchema( sparkSession: SparkSession, options: Map[String, String], files: Seq[FileStatus]): Option[StructType] |
|
Note
|
inferSchema is part of FileFormat Contract.
|
inferSchema…FIXME
Building Partitioned Data Reader — buildReader Method
|
1 2 3 4 5 6 7 8 9 10 11 12 |
buildReader( sparkSession: SparkSession, dataSchema: StructType, partitionSchema: StructType, requiredSchema: StructType, filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] |
|
Note
|
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.
|
buildReader…FIXME
Preparing Write Job — prepareWrite Method
|
1 2 3 4 5 6 7 8 9 |
prepareWrite( sparkSession: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory |
|
Note
|
prepareWrite is part of the FileFormat Contract to prepare a write job.
|
prepareWrite…FIXME
spark技术分享