关注 spark技术分享,
撸spark源码 玩spark最佳实践

JsonFileFormat

JsonFileFormat — Built-In Support for Files in JSON Format

JsonFileFormat is a TextBasedFileFormat for json format (i.e. registers itself to handle files in json format and convert them to Spark SQL rows).

JsonFileFormat comes with options to further customize JSON parsing.

Note
JsonFileFormat uses Jackson 2.6.7 as the JSON parser library and some options map directly to Jackson’s internal options (as JsonParser.Feature).
Table 1. JsonFileFormat’s Options
Option Default Value Description

allowBackslashEscapingAnyCharacter

false

Note
Internally, allowBackslashEscapingAnyCharacter becomes JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER.

allowComments

false

Note
Internally, allowComments becomes JsonParser.Feature.ALLOW_COMMENTS.

allowNonNumericNumbers

true

Note
Internally, allowNonNumericNumbers becomes JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS.

allowNumericLeadingZeros

false

Note
Internally, allowNumericLeadingZeros becomes JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS.

allowSingleQuotes

true

Note
Internally, allowSingleQuotes becomes JsonParser.Feature.ALLOW_SINGLE_QUOTES.

allowUnquotedControlChars

false

Note
Internally, allowUnquotedControlChars becomes JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS.

allowUnquotedFieldNames

false

Note
Internally, allowUnquotedFieldNames becomes JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES.

columnNameOfCorruptRecord

compression

Compression codec that can be either one of the known aliases or a fully-qualified class name.

dateFormat

yyyy-MM-dd

Date format

Note
Internally, dateFormat is converted to Apache Commons Lang’s FastDateFormat.

multiLine

false

Controls whether…​FIXME

mode

PERMISSIVE

Case insensitive name of the parse mode

  • PERMISSIVE

  • DROPMALFORMED

  • FAILFAST

prefersDecimal

false

primitivesAsString

false

samplingRatio

1.0

timestampFormat

yyyy-MM-dd’T’HH:mm:ss.SSSXXX

Timestamp format

Note
Internally, timestampFormat is converted to Apache Commons Lang’s FastDateFormat.

timeZone

Java’s TimeZone

isSplitable Method

Note
isSplitable is part of FileFormat Contract.

isSplitable…​FIXME

inferSchema Method

Note
inferSchema is part of FileFormat Contract.

inferSchema…​FIXME

Building Partitioned Data Reader — buildReader Method

Note
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.

buildReader…​FIXME

Preparing Write Job — prepareWrite Method

Note
prepareWrite is part of the FileFormat Contract to prepare a write job.

prepareWrite…​FIXME

赞(0) 打赏
未经允许不得转载:spark技术分享 » JsonFileFormat
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏