关注 spark技术分享,
撸spark源码 玩spark最佳实践

CSVFileFormat

CSVFileFormat

CSVFileFormat is a TextBasedFileFormat for csv format (i.e. registers itself to handle files in csv format and converts them to Spark SQL rows).

CSVFileFormat uses CSV options (that in turn are used to configure the underlying CSV parser from uniVocity-parsers project).

Table 1. CSVFileFormat’s Options
Option Default Value Description

charset

UTF-8

Alias of encoding

charToEscapeQuoteEscaping

\\

One character to…​FIXME

codec

Compression codec that can be either one of the known aliases or a fully-qualified class name.

Alias of compression

columnNameOfCorruptRecord

comment

\u0000

compression

Compression codec that can be either one of the known aliases or a fully-qualified class name.

Alias of codec

dateFormat

yyyy-MM-dd

Uses en_US locale

delimiter

, (comma)

Alias of sep

encoding

UTF-8

Alias of charset

escape

\\

escapeQuotes

true

header

ignoreLeadingWhiteSpace

  • false (for reading)

  • true (for writing)

ignoreTrailingWhiteSpace

  • false (for reading)

  • true (for writing)

inferSchema

maxCharsPerColumn

-1

maxColumns

20480

mode

PERMISSIVE

Possible values:

  • DROPMALFORMED

  • PERMISSIVE (default)

  • FAILFAST

multiLine

false

nanValue

NaN

negativeInf

-Inf

nullValue

(empty string)

positiveInf

Inf

sep

, (comma)

Alias of delimiter

timestampFormat

yyyy-MM-dd’T’HH:mm:ss.SSSXXX

Uses timeZone and en_US locale

timeZone

spark.sql.session.timeZone

quote

\"

quoteAll

false

Preparing Write Job — prepareWrite Method

Note
prepareWrite is part of the FileFormat Contract to prepare a write job.

prepareWrite…​FIXME

Building Partitioned Data Reader — buildReader Method

Note
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.

buildReader…​FIXME

赞(0) 打赏
未经允许不得转载:spark技术分享 » CSVFileFormat
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏