LocalTableScanExec Physical Operator
LocalTableScanExec is a leaf physical operator (i.e. no children) and producedAttributes being outputSet.
LocalTableScanExec is created when BasicOperators execution planning strategy resolves LocalRelation and Spark Structured Streaming’s MemoryPlan logical operators.
|
Tip
|
Read on MemoryPlan logical operator in the Spark Structured Streaming gitbook.
|
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
val names = Seq("Jacek", "Agata").toDF("name") val optimizedPlan = names.queryExecution.optimizedPlan scala> println(optimizedPlan.numberedTreeString) 00 LocalRelation [name#9] // Physical plan with LocalTableScanExec operator (shown as LocalTableScan) scala> names.explain == Physical Plan == LocalTableScan [name#9] // Going fairly low-level...you've been warned val plan = names.queryExecution.executedPlan import org.apache.spark.sql.execution.LocalTableScanExec val ltse = plan.asInstanceOf[LocalTableScanExec] val ltseRDD = ltse.execute() scala> :type ltseRDD org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow] scala> println(ltseRDD.toDebugString) (2) MapPartitionsRDD[1] at execute at <console>:30 [] | ParallelCollectionRDD[0] at execute at <console>:30 [] // no computation on the source dataset has really occurred yet // Let's trigger a RDD action scala> ltseRDD.first res6: org.apache.spark.sql.catalyst.InternalRow = [0,1000000005,6b6563614a] // Low-level "show" scala> ltseRDD.foreach(println) [0,1000000005,6b6563614a] [0,1000000005,6174616741] // High-level show scala> names.show +-----+ | name| +-----+ |Jacek| |Agata| +-----+ |
| Key | Name (in web UI) | Description |
|---|---|---|
|
number of output rows |
|
Note
|
It appears that when no Spark job is used to execute a
|
When executed, LocalTableScanExec…FIXME
Figure 1. LocalTableScanExec in web UI (Details for Query)
| Name | Description |
|---|---|
|
Internal binary rows for…FIXME |
|
Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method
|
1 2 3 4 5 |
doExecute(): RDD[InternalRow] |
|
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).
|
doExecute…FIXME
Creating LocalTableScanExec Instance
LocalTableScanExec takes the following when created:
-
Output schema attributes
spark技术分享