FileStreamSink — Streaming Sink for Parquet Format
FileStreamSink is the streaming sink that writes out the results of a streaming query to parquet files.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import scala.concurrent.duration._ import org.apache.spark.sql.streaming.{OutputMode, Trigger} val out = in. writeStream. format("parquet"). option("path", "parquet-output-dir"). option("checkpointLocation", "checkpoint-dir"). trigger(Trigger.ProcessingTime(10.seconds)). outputMode(OutputMode.Append). start |
FileStreamSink is created exclusively when DataSource is requested to create a streaming sink.
FileStreamSink supports Append output mode only.
FileStreamSink uses spark.sql.streaming.fileSink.log.deletion (as isDeletingExpiredLog)
The textual representation of FileStreamSink is FileSink[path]
| Name | Description |
|---|---|
|
|
Used when…FIXME |
|
|
Used when…FIXME |
|
|
Used when…FIXME |
|
|
Used when…FIXME |
addBatch Method
|
1 2 3 4 5 |
addBatch(batchId: Long, data: DataFrame): Unit |
|
Note
|
addBatch is a part of Sink Contract to “add” a batch of data to the sink.
|
addBatch…FIXME
Creating FileStreamSink Instance
FileStreamSink takes the following when created:
-
Path with the metadata directory
FileStreamSink initializes the internal registries and counters.
spark技术分享