ProjectExec Unary Physical Operator
ProjectExec
is a unary physical operator (i.e. with one child physical operator) that…FIXME
ProjectExec
supports Java code generation (aka codegen).
ProjectExec
is created when:
-
InMemoryScans and HiveTableScans execution planning strategies are executed (and request
SparkPlanner
to pruneFilterProject) -
BasicOperators
execution planning strategy is requested to resolve a Project logical operator -
DataSourceStrategy
execution planning strategy is requested to creates a RowDataSourceScanExec -
FileSourceStrategy
execution planning strategy is requested to plan a LogicalRelation with a HadoopFsRelation -
ExtractPythonUDFs
physical optimization is requested to optimize a physical query plan (and extracts Python UDFs)
Note
|
The following is the order of applying the above execution planning strategies to logical query plans when
|
Executing Physical Operator (Generating RDD[InternalRow]) — doExecute
Method
1 2 3 4 5 |
doExecute(): RDD[InternalRow] |
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow] ).
|
doExecute
requests the input child physical plan to produce an RDD of internal rows and applies a calculation over indexed partitions (using RDD.mapPartitionsWithIndexInternal
).
1 2 3 4 5 6 7 8 |
RDD.mapPartitionsWithIndexInternal mapPartitionsWithIndexInternal[U]( f: (Int, Iterator[T]) => Iterator[U], preservesPartitioning: Boolean = false) |
Inside doExecute
(RDD.mapPartitionsWithIndexInternal
)
Inside the function (that is part of RDD.mapPartitionsWithIndexInternal
), doExecute
creates an UnsafeProjection with the following:
doExecute
requests the UnsafeProjection
to initialize and maps over the internal rows (of a partition) using the projection.
Creating ProjectExec Instance
ProjectExec
takes the following when created:
-
NamedExpressions for the projection
-
Child physical operator
Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume
Method
1 2 3 4 5 |
doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String |
Note
|
doConsume is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.
|
doConsume
…FIXME