关注 spark技术分享,
撸spark源码 玩spark最佳实践

Streaming Sink — Adding Batches of Data to Storage

Streaming Sink — Adding Batches of Data to Storage

Sink is the contract for streaming writes, i.e. adding batches to an output every trigger.

Note
Sink is part of the so-called Structured Streaming V1 that is currently being rewritten to StreamWriteSupport in V2.

Sink is a single-method interface with addBatch method.

addBatch is used to “add” a batch of data to the sink (for batchId batch).

addBatch is used when StreamExecution runs a batch.

Table 1. Sinks
Format / Operator Sink

console

ConsoleSink

Any FileFormat

  • csv

  • hive

  • json

  • libsvm

  • orc

  • parquet

  • text

FileStreamSink

foreach operator

ForeachSink

kafka

KafkaSink

memory

MemorySink

Tip
You can create your own streaming format implementing StreamSinkProvider.

When creating a custom Sink it is recommended to accept the options (e.g. Map[String, String]) that the DataStreamWriter was configured with. You can then use the options to fine-tune the write path.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Streaming Sink — Adding Batches of Data to Storage
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏