LocalTableScanExec Physical Operator
LocalTableScanExec
is a leaf physical operator (i.e. no children) and producedAttributes
being outputSet
.
LocalTableScanExec
is created when BasicOperators execution planning strategy resolves LocalRelation and Spark Structured Streaming’s MemoryPlan
logical operators.
Tip
|
Read on MemoryPlan logical operator in the Spark Structured Streaming gitbook.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
val names = Seq("Jacek", "Agata").toDF("name") val optimizedPlan = names.queryExecution.optimizedPlan scala> println(optimizedPlan.numberedTreeString) 00 LocalRelation [name#9] // Physical plan with LocalTableScanExec operator (shown as LocalTableScan) scala> names.explain == Physical Plan == LocalTableScan [name#9] // Going fairly low-level...you've been warned val plan = names.queryExecution.executedPlan import org.apache.spark.sql.execution.LocalTableScanExec val ltse = plan.asInstanceOf[LocalTableScanExec] val ltseRDD = ltse.execute() scala> :type ltseRDD org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow] scala> println(ltseRDD.toDebugString) (2) MapPartitionsRDD[1] at execute at <console>:30 [] | ParallelCollectionRDD[0] at execute at <console>:30 [] // no computation on the source dataset has really occurred yet // Let's trigger a RDD action scala> ltseRDD.first res6: org.apache.spark.sql.catalyst.InternalRow = [0,1000000005,6b6563614a] // Low-level "show" scala> ltseRDD.foreach(println) [0,1000000005,6b6563614a] [0,1000000005,6174616741] // High-level show scala> names.show +-----+ | name| +-----+ |Jacek| |Agata| +-----+ |
Key | Name (in web UI) | Description |
---|---|---|
number of output rows |
Note
|
It appears that when no Spark job is used to execute a
|
When executed, LocalTableScanExec
…FIXME
Figure 1. LocalTableScanExec in web UI (Details for Query)
Name | Description |
---|---|
Internal binary rows for…FIXME |
|
Executing Physical Operator (Generating RDD[InternalRow]) — doExecute
Method
1 2 3 4 5 |
doExecute(): RDD[InternalRow] |
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow] ).
|
doExecute
…FIXME
Creating LocalTableScanExec Instance
LocalTableScanExec
takes the following when created:
-
Output schema attributes