关注 spark技术分享,
撸spark源码 玩spark最佳实践

SQLExecution Helper Object

SQLExecution Helper Object

SQLExecution defines spark.sql.execution.id Spark property that is used to track multiple Spark jobs that should all together constitute a single structured query execution (that could be easily reported as a single execution unit).

Structured query actions are executed using SQLExecution.withNewExecutionId static method that sets spark.sql.execution.id as Spark Core’s local property and “stitches” different Spark jobs as parts of one structured query action (that you can then see in web UI’s SQL tab).

Tip

Use SparkListener to listen to SparkListenerSQLExecutionStart events and know the execution ids of structured queries that have been executed in a Spark SQL application.

Note
Jobs without spark.sql.execution.id key are not considered to belong to SQL query executions.

SQLExecution keeps track of all execution ids and their QueryExecutions in executionIdToQueryExecution internal registry.

Tip
Use SQLExecution.getQueryExecution to find the QueryExecution for an execution id.

Executing Dataset Action (with Zero or More Spark Jobs) Under New Execution Id — withNewExecutionId Method

withNewExecutionId executes body query action with a new execution id (given as the input executionId or auto-generated) so that all Spark jobs that have been scheduled by the query action could be marked as parts of the same Dataset action execution.

withNewExecutionId allows for collecting all the Spark jobs (even executed on separate threads) together under a single SQL query execution for reporting purposes, e.g. to reporting them as one single structured query in web UI.

Note
If there is another execution id already set, it is replaced for the course of the current action.

In addition, the QueryExecution variant posts SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd events (to LiveListenerBus event bus) before and after executing the body action, respectively. It is used to inform SQLListener when a SQL query execution starts and ends.

Note
Nested execution ids are not supported in the QueryExecution variant.
Note

withNewExecutionId is used when:

  • Dataset is requested to Dataset.withNewExecutionId

  • Dataset is requested to withAction

  • DataFrameWriter is requested to run a command

  • Spark Structured Streaming’s StreamExecution commits a batch to a streaming sink

  • Spark Thrift Server’s SparkSQLDriver runs a command

Finding QueryExecution for Execution ID — getQueryExecution Method

getQueryExecution gives the QueryExecution for the executionId or null if not found.

Executing Action (with Zero or More Spark Jobs) Tracked Under Given Execution Id — withExecutionId Method

withExecutionId executes the body action as part of executing multiple Spark jobs under executionId execution identifier.

Note

withExecutionId is used when:

赞(0) 打赏
未经允许不得转载:spark技术分享 » SQLExecution Helper Object
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏