FileStreamSink — Streaming Sink for Parquet Format
FileStreamSink
is the streaming sink that writes out the results of a streaming query to parquet
files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import scala.concurrent.duration._ import org.apache.spark.sql.streaming.{OutputMode, Trigger} val out = in. writeStream. format("parquet"). option("path", "parquet-output-dir"). option("checkpointLocation", "checkpoint-dir"). trigger(Trigger.ProcessingTime(10.seconds)). outputMode(OutputMode.Append). start |
FileStreamSink
is created exclusively when DataSource
is requested to create a streaming sink.
FileStreamSink
supports Append output mode only.
FileStreamSink
uses spark.sql.streaming.fileSink.log.deletion (as isDeletingExpiredLog
)
The textual representation of FileStreamSink
is FileSink[path]
Name | Description |
---|---|
|
Used when…FIXME |
|
Used when…FIXME |
|
Used when…FIXME |
|
Used when…FIXME |
addBatch
Method
1 2 3 4 5 |
addBatch(batchId: Long, data: DataFrame): Unit |
Note
|
addBatch is a part of Sink Contract to “add” a batch of data to the sink.
|
addBatch
…FIXME
Creating FileStreamSink Instance
FileStreamSink
takes the following when created:
-
Path with the metadata directory
FileStreamSink
initializes the internal registries and counters.