Debugging Query Execution
debug
package object contains tools for debugging query execution, i.e. a full analysis of structured queries (as Datasets).
Method | Description | ||
---|---|---|---|
Debugging a structured query
|
|||
Displays the Java source code generated for a structured query in whole-stage code generation (i.e. the output of each WholeStageCodegen subtree in a query plan).
|
debug
package object is in org.apache.spark.sql.execution.debug
package that you have to import before you can use the debug and debugCodegen methods.
1 2 3 4 5 6 7 8 9 10 11 |
// Import the package object import org.apache.spark.sql.execution.debug._ // Every Dataset (incl. DataFrame) has now the debug and debugCodegen methods val q: DataFrame = ... q.debug q.debugCodegen |
Tip
|
Read up on Package Objects in the Scala programming language. |
Internally, debug
package object uses DebugQuery
implicit class that “extends” Dataset[_]
Scala type with the debug methods.
1 2 3 4 5 6 7 8 |
implicit class DebugQuery(query: Dataset[_]) { def debug(): Unit = ... def debugCodegen(): Unit = ... } |
Tip
|
Read up on Implicit Classes in the official documentation of the Scala programming language. |
Debugging Dataset — debug
Method
1 2 3 4 5 |
debug(): Unit |
debug
requests the QueryExecution (of the structured query) for the optimized physical query plan.
debug
transforms the optimized physical query plan to add a new DebugExec physical operator for every physical operator.
debug
requests the query plan to execute and then counts the number of rows in the result. It prints out the following message:
1 2 3 4 5 |
Results returned: [count] |
In the end, debug
requests every DebugExec
physical operator (in the query plan) to dumpStats.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
val q = spark.range(10).where('id === 4) scala> :type q org.apache.spark.sql.Dataset[Long] // Extend Dataset[Long] with debug and debugCodegen methods import org.apache.spark.sql.execution.debug._ scala> q.debug Results returned: 1 == WholeStageCodegen == Tuples output: 1 id LongType: {java.lang.Long} == Filter (id#0L = 4) == Tuples output: 0 id LongType: {} == Range (0, 10, step=1, splits=8) == Tuples output: 0 id LongType: {} |
Displaying Java Source Code Generated for Structured Query in Whole-Stage Code Generation (“Debugging” Codegen) — debugCodegen
Method
1 2 3 4 5 |
debugCodegen(): Unit |
debugCodegen
requests the QueryExecution (of the structured query) for the optimized physical query plan.
In the end, debugCodegen
simply codegenString the query plan and prints it out to the standard output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import org.apache.spark.sql.execution.debug._ scala> spark.range(10).where('id === 4).debugCodegen Found 1 WholeStageCodegen subtrees. == Subtree 1 / 1 == *Filter (id#29L = 4) +- *Range (0, 10, splits=8) Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 006 */ private Object[] references; ... |
Note
|
|