关注 spark技术分享,
撸spark源码 玩spark最佳实践

FileStreamSink — Streaming Sink for Parquet Format

FileStreamSink — Streaming Sink for Parquet Format

FileStreamSink is the streaming sink that writes out the results of a streaming query to parquet files.

FileStreamSink is created exclusively when DataSource is requested to create a streaming sink.

FileStreamSink supports Append output mode only.

FileStreamSink uses spark.sql.streaming.fileSink.log.deletion (as isDeletingExpiredLog)

The textual representation of FileStreamSink is FileSink[path]

Table 1. FileStreamSink’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

basePath

FIXME

Used when…​FIXME

logPath

FIXME

Used when…​FIXME

fileLog

FIXME

Used when…​FIXME

hadoopConf

FIXME

Used when…​FIXME

addBatch Method

Note
addBatch is a part of Sink Contract to “add” a batch of data to the sink.

addBatch…​FIXME

Creating FileStreamSink Instance

FileStreamSink takes the following when created:

  • SparkSession

  • Path with the metadata directory

  • FileFormat

  • Names of the partition columns

  • Configuration options

FileStreamSink initializes the internal registries and counters.

赞(0) 打赏
未经允许不得转载:spark技术分享 » FileStreamSink — Streaming Sink for Parquet Format
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏