HiveFileFormat — FileFormat For Writing Hive Tables
HiveFileFormat
is a FileFormat for writing Hive tables.
HiveFileFormat
is a DataSourceRegister and registers itself as hive data source.
Note
|
Hive data source can only be used with tables and you cannot read or write files of Hive data source directly. Use DataFrameReader.table or DataFrameWriter.saveAsTable for loading from or writing data to Hive data source, respectively. |
HiveFileFormat
is created exclusively when SaveAsHiveFile
is requested to saveAsHiveFile (when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed).
HiveFileFormat
takes a FileSinkDesc
when created.
HiveFileFormat
throws a UnsupportedOperationException
when requested to inferSchema.
1 2 3 4 5 |
inferSchema is not supported for hive data source. |
Preparing Write Job — prepareWrite
Method
1 2 3 4 5 6 7 8 9 |
prepareWrite( sparkSession: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory |
Note
|
prepareWrite is part of the FileFormat Contract to prepare a write job.
|
prepareWrite
sets the mapred.output.format.class property to be the getOutputFileFormatClassName
of the Hive TableDesc
of the FileSinkDesc.
prepareWrite
requests the HiveTableUtil
helper object to configureJobPropertiesForStorageHandler
.
prepareWrite
requests the Hive Utilities
helper object to copyTableJobPropertiesToConf
.
In the end, prepareWrite
creates a new OutputWriterFactory
that creates a new HiveOutputWriter
when requested for a new OutputWriter
instance.