AvroFileFormat — FileFormat For Avro-Encoded Files
AvroFileFormat
is a FileFormat for Apache Avro, i.e. a data source format that can read and write Avro-encoded data in files.
AvroFileFormat
is a DataSourceRegister and registers itself as avro data source.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
// ./bin/spark-shell --packages org.apache.spark:spark-avro_2.12:2.4.0 // Writing data to Avro file(s) spark .range(1) .write .format("avro") // <-- Triggers AvroFileFormat .save("data.avro") // Reading Avro data from file(s) val q = spark .read .format("avro") // <-- Triggers AvroFileFormat .load("data.avro") scala> q.show +---+ | id| +---+ | 0| +---+ |
AvroFileFormat
is splitable, i.e. FIXME
Building Partitioned Data Reader — buildReader
Method
1 2 3 4 5 6 7 8 9 10 11 12 |
buildReader( spark: SparkSession, dataSchema: StructType, partitionSchema: StructType, requiredSchema: StructType, filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] |
Note
|
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.
|
buildReader
…FIXME
Inferring Schema — inferSchema
Method
1 2 3 4 5 6 7 8 |
inferSchema( spark: SparkSession, options: Map[String, String], files: Seq[FileStatus]): Option[StructType] |
Note
|
inferSchema is part of the FileFormat Contract to infer (return) the schema of the given files.
|
inferSchema
…FIXME
Preparing Write Job — prepareWrite
Method
1 2 3 4 5 6 7 8 9 |
prepareWrite( spark: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory |
Note
|
prepareWrite is part of the FileFormat Contract to prepare a write job.
|
prepareWrite
…FIXME