SQLExecution Helper Object
SQLExecution
defines spark.sql.execution.id Spark property that is used to track multiple Spark jobs that should all together constitute a single structured query execution (that could be easily reported as a single execution unit).
1 2 3 4 5 6 7 |
import org.apache.spark.sql.execution.SQLExecution scala> println(SQLExecution.EXECUTION_ID_KEY) spark.sql.execution.id |
Structured query actions are executed using SQLExecution.withNewExecutionId static method that sets spark.sql.execution.id as Spark Core’s local property and “stitches” different Spark jobs as parts of one structured query action (that you can then see in web UI’s SQL tab).
Tip
|
Use SparkListener to listen to SparkListenerSQLExecutionStart events and know the execution ids of structured queries that have been executed in a Spark SQL application.
|
Note
|
Jobs without spark.sql.execution.id key are not considered to belong to SQL query executions. |
SQLExecution
keeps track of all execution ids and their QueryExecutions in executionIdToQueryExecution
internal registry.
Tip
|
Use SQLExecution.getQueryExecution to find the QueryExecution for an execution id. |
Executing Dataset Action (with Zero or More Spark Jobs) Under New Execution Id — withNewExecutionId
Method
1 2 3 4 5 6 7 |
withNewExecutionId[T]( sparkSession: SparkSession, queryExecution: QueryExecution)(body: => T): T |
withNewExecutionId
executes body
query action with a new execution id (given as the input executionId
or auto-generated) so that all Spark jobs that have been scheduled by the query action could be marked as parts of the same Dataset
action execution.
withNewExecutionId
allows for collecting all the Spark jobs (even executed on separate threads) together under a single SQL query execution for reporting purposes, e.g. to reporting them as one single structured query in web UI.
Note
|
If there is another execution id already set, it is replaced for the course of the current action. |
In addition, the QueryExecution
variant posts SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd events (to LiveListenerBus event bus) before and after executing the body
action, respectively. It is used to inform SQLListener
when a SQL query execution starts and ends.
Note
|
Nested execution ids are not supported in the QueryExecution variant.
|
Note
|
|
Finding QueryExecution for Execution ID — getQueryExecution
Method
1 2 3 4 5 |
getQueryExecution(executionId: Long): QueryExecution |
getQueryExecution
gives the QueryExecution for the executionId
or null
if not found.
Executing Action (with Zero or More Spark Jobs) Tracked Under Given Execution Id — withExecutionId
Method
1 2 3 4 5 6 7 |
withExecutionId[T]( sc: SparkContext, executionId: String)(body: => T): T |
withExecutionId
executes the body
action as part of executing multiple Spark jobs under executionId
execution identifier.
1 2 3 4 5 6 7 |
def body = println("Hello World") scala> SQLExecution.withExecutionId(sc = spark.sparkContext, executionId = "Custom Name")(body) Hello World |
Note
|
|