OutputMode-spark技术分享

OutputMode

Output mode (OutputMode) describes what data is written to a streaming sink when there is new data available in streaming data sources (in a trigger / streaming batch).

The output mode of a streaming query is specified using DataStreamWriter.outputMode method.



val inputStream = spark.
  readStream.
  format("rate").
  load
import org.apache.spark.sql.streaming.{OutputMode, Trigger}
import scala.concurrent.duration._
val consoleOutput = inputStream.
  writeStream.
  format("console").
  option("truncate", false).
  trigger(Trigger.ProcessingTime(10.seconds)).
  queryName("rate-console").
  option("checkpointLocation", "checkpoint").
  outputMode(OutputMode.Update).  // <-- update output mode
  start

val inputStream = spark.

readStream.

format("rate").

load

import org.apache.spark.sql.streaming.{OutputMode, Trigger}

import scala.concurrent.duration._

val consoleOutput = inputStream.

writeStream.

format("console").

option("truncate", false).

trigger(Trigger.ProcessingTime(10.seconds)).

queryName("rate-console").

option("checkpointLocation", "checkpoint").

outputMode(OutputMode.Update). // <-- update output mode

start

OutputMode Name Behaviour

Append

append

Default output mode that writes “new” rows only.

Note	For streaming aggregations, “new” row is when the intermediate state becomes final, i.e. when new events for the grouping key can only be considered late which is when watermark moves past the event time of the key.

Note	`Append` output mode requires that a streaming query defines event time watermark (using withWatermark operator) on the event time column that is used in aggregation (directly or using window function).

Required for datasets with FileFormat format (to create FileStreamSink)

Used for flatMapGroupsWithState operator

Note	`Append` is mandatory when multiple `flatMapGroupsWithState` operators are used in a structured query.

Complete

complete

Writes all rows (every time there are updates) and therefore corresponds to a traditional batch query.

Note	Supported only for streaming queries with groupBy or groupByKey aggregations (as asserted by UnsupportedOperationChecker).

Update

update

Write the rows that were updated (every time there are updates). If the query does not contain aggregations, it is equivalent to Append mode.

Used for mapGroupsWithState and flatMapGroupsWithState operators

OutputMode

OutputMode

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部