关注 spark技术分享,
撸spark源码 玩spark最佳实践

StateStoreCoordinator — Tracking Locations of StateStores for Streaming RDDs

StateStoreCoordinator — Tracking Locations of StateStores for StateStoreRDD

StateStoreCoordinator keeps track of StateStores loaded in Spark executors (across the nodes in a Spark cluster).

The main purpose of StateStoreCoordinator is for StateStoreRDD to get the location preferences for partitions (based on the location of the stores).

StateStoreCoordinator uses instances internal registry of StateStoreProviders by their ids and ExecutorCacheTaskLocations.

StateStoreCoordinator is a ThreadSafeRpcEndpoint RPC endpoint that manipulates instances registry through RPC messages.

Table 1. StateStoreCoordinator RPC Endpoint’s Messages and Message Handlers (in alphabetical order)
Message Message Handler

DeactivateInstances

Removes StateStoreProviderIds (from instances) with queryRunId as runId

You should see the following DEBUG message in the logs:

GetLocation

Gives the location of StateStoreProviderId (from instances) with the host and an executor id on that host.

You should see the following DEBUG message in the logs:

ReportActiveInstance

Registers StateStoreProviderId that is active on an executor (given host and port).

You should see the following DEBUG message in the logs:

StopCoordinator

Stops StateStoreCoordinator RPC Endpoint

You should see the following DEBUG message in the logs:

VerifyIfInstanceActive

Verifies if StateStoreProviderId is registered (in instances) on executorId

You should see the following DEBUG message in the logs:

Tip

Enable INFO or DEBUG logging level for org.apache.spark.sql.execution.streaming.state.StateStoreCoordinator to see what happens inside.

Add the following line to conf/log4j.properties:

Refer to Logging.

赞(0) 打赏
未经允许不得转载:spark技术分享 » StateStoreCoordinator — Tracking Locations of StateStores for Streaming RDDs
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏