关注 spark技术分享,
撸spark源码 玩spark最佳实践

Configuration Properties

Configuration Properties

The following list are the properties that you can use to fine-tune Spark Structured Streaming applications.

You can set them in a SparkSession upon instantiation using config method.

Table 1. Structured Streaming’s Properties
Name / Default Value Description

spark.sql.streaming.aggregation.stateFormatVersion

2

(internal) State format version used by streaming aggregation operations in a streaming query.

Supported values: 1 or 2

State between versions are tend to be incompatible, so state format version shouldn’t be modified after running.

spark.sql.streaming.checkpointLocation

(empty)

Default checkpoint directory for storing checkpoint data for streaming queries

spark.sql.streaming.disabledV2MicroBatchReaders

(empty)

(internal) A comma-separated list of fully-qualified data source register class names for which MicroBatchReadSupport is disabled. Reads from these sources will fall back to the V1 Sources.

Use SQLConf.disabledV2StreamingMicroBatchReaders to get the current value.

spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion

2

(internal) State format version used by flatMapGroupsWithState operation in a streaming query.

Supported values:

  • 1

  • 2

spark.sql.streaming.maxBatchesToRetainInMemory

2

(internal) The maximum number of batches which will be retained in memory to avoid loading from files.

Maximum count of versions a State Store implementation should retain in memory.

The value adjusts a trade-off between memory usage vs cache miss:

  • 2 covers both success and direct failure cases

  • 1 covers only success case

  • 0 or negative value disables cache to maximize memory size of executors

Used exclusively when HDFSBackedStateStoreProvider is requested to initialize.

spark.sql.streaming.metricsEnabled

false

Flag whether Dropwizard CodaHale metrics will be reported for active streaming queries

spark.sql.streaming.minBatchesToRetain

100

(internal) The minimum number of batches that must be retained and made recoverable.

Used…​FIXME

spark.sql.streaming.multipleWatermarkPolicy

min

Policy to calculate the global watermark value when there are multiple watermark operators in a streaming query.

Supported values:

  • min (default) – chooses the minimum watermark reported across multiple operators

  • max – chooses the maximum across multiple operators

Note
This configuration cannot be changed between query restarts from the same checkpoint location.

spark.sql.streaming.numRecentProgressUpdates

100

Number of progress updates to retain for a streaming query

spark.sql.streaming.pollingDelay

10

(internal) Time delay (in ms) before StreamExecution polls for new data when no data was available in a batch.

spark.sql.streaming.stateStore.maintenanceInterval

60s

The initial delay and how often to execute StateStore’s maintenance task.

spark.sql.streaming.stateStore.providerClass

org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider

(internal) The fully-qualified class name of the StateStoreProvider implementation that manages state data in stateful streaming queries. This class must have a zero-arg constructor.

Use SQLConf.stateStoreProviderClass to get the current value.

spark.sql.streaming.unsupportedOperationCheck

true

(internal) When enabled (true), StreamingQueryManager makes sure that the logical plan of a streaming query uses supported operations only.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Configuration Properties
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏