Debugging Query Execution
debug package object contains tools for debugging query execution, i.e. a full analysis of structured queries (as Datasets).
| Method | Description | ||
|---|---|---|---|
|
Debugging a structured query
|
|||
|
Displays the Java source code generated for a structured query in whole-stage code generation (i.e. the output of each WholeStageCodegen subtree in a query plan).
|
debug package object is in org.apache.spark.sql.execution.debug package that you have to import before you can use the debug and debugCodegen methods.
|
1 2 3 4 5 6 7 8 9 10 11 |
// Import the package object import org.apache.spark.sql.execution.debug._ // Every Dataset (incl. DataFrame) has now the debug and debugCodegen methods val q: DataFrame = ... q.debug q.debugCodegen |
|
Tip
|
Read up on Package Objects in the Scala programming language. |
Internally, debug package object uses DebugQuery implicit class that “extends” Dataset[_] Scala type with the debug methods.
|
1 2 3 4 5 6 7 8 |
implicit class DebugQuery(query: Dataset[_]) { def debug(): Unit = ... def debugCodegen(): Unit = ... } |
|
Tip
|
Read up on Implicit Classes in the official documentation of the Scala programming language. |
Debugging Dataset — debug Method
|
1 2 3 4 5 |
debug(): Unit |
debug requests the QueryExecution (of the structured query) for the optimized physical query plan.
debug transforms the optimized physical query plan to add a new DebugExec physical operator for every physical operator.
debug requests the query plan to execute and then counts the number of rows in the result. It prints out the following message:
|
1 2 3 4 5 |
Results returned: [count] |
In the end, debug requests every DebugExec physical operator (in the query plan) to dumpStats.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
val q = spark.range(10).where('id === 4) scala> :type q org.apache.spark.sql.Dataset[Long] // Extend Dataset[Long] with debug and debugCodegen methods import org.apache.spark.sql.execution.debug._ scala> q.debug Results returned: 1 == WholeStageCodegen == Tuples output: 1 id LongType: {java.lang.Long} == Filter (id#0L = 4) == Tuples output: 0 id LongType: {} == Range (0, 10, step=1, splits=8) == Tuples output: 0 id LongType: {} |
Displaying Java Source Code Generated for Structured Query in Whole-Stage Code Generation (“Debugging” Codegen) — debugCodegen Method
|
1 2 3 4 5 |
debugCodegen(): Unit |
debugCodegen requests the QueryExecution (of the structured query) for the optimized physical query plan.
In the end, debugCodegen simply codegenString the query plan and prints it out to the standard output.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import org.apache.spark.sql.execution.debug._ scala> spark.range(10).where('id === 4).debugCodegen Found 1 WholeStageCodegen subtrees. == Subtree 1 / 1 == *Filter (id#29L = 4) +- *Range (0, 10, splits=8) Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 006 */ private Object[] references; ... |
|
Note
|
|
spark技术分享