ConsoleSink for Showing DataFrames to Console
ConsoleSink
is a streaming sink that shows the DataFrame (for a batch) to the console.
ConsoleSink
is registered as console format (by ConsoleSinkProvider).
Name | Default Value | Description |
---|---|---|
|
Number of rows to display |
|
|
Truncate the data to display to 20 characters |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
scala> spark.version res0: String = 2.3.0-SNAPSHOT import org.apache.spark.sql.streaming.{OutputMode, Trigger} import scala.concurrent.duration._ val query = spark. readStream. format("rate"). load. writeStream. format("console"). // <-- use ConsoleSink option("truncate", false). option("numRows", 10). trigger(Trigger.ProcessingTime(10.seconds)). queryName("rate-console"). start ------------------------------------------- Batch: 0 ------------------------------------------- +---------+-----+ |timestamp|value| +---------+-----+ +---------+-----+ |
Adding Batch (by Showing DataFrame to Console) — addBatch
Method
1 2 3 4 5 |
addBatch(batchId: Long, data: DataFrame): Unit |
Note
|
addBatch is a part of Sink Contract to “add” a batch of data to the sink.
|
Internally, addBatch
records the input batchId
in lastBatchId internal property.
addBatch
collects the input data
DataFrame
and creates a brand new DataFrame that it then shows (per numRowsToShow and isTruncated properties).
1 2 3 4 5 6 7 8 9 10 11 |
------------------------------------------- Batch: [batchId] ------------------------------------------- +---------+-----+ |timestamp|value| +---------+-----+ +---------+-----+ |
Note
|
You may see Rerun batch: instead if the input batchId is below lastBatchId (likely due to a batch failure).
|