关注 spark技术分享,
撸spark源码 玩spark最佳实践

ProjectExec

ProjectExec Unary Physical Operator

ProjectExec is a unary physical operator (i.e. with one child physical operator) that…​FIXME

ProjectExec supports Java code generation (aka codegen).

ProjectExec is created when:

Note

The following is the order of applying the above execution planning strategies to logical query plans when SparkPlanner or Hive-specific SparkPlanner are requested to plan a logical query plan into one or more physical query plans:

  1. HiveTableScans

  2. FileSourceStrategy

  3. DataSourceStrategy

  4. InMemoryScans

  5. BasicOperators

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Note
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute requests the input child physical plan to produce an RDD of internal rows and applies a calculation over indexed partitions (using RDD.mapPartitionsWithIndexInternal).

Inside doExecute (RDD.mapPartitionsWithIndexInternal)

Inside the function (that is part of RDD.mapPartitionsWithIndexInternal), doExecute creates an UnsafeProjection with the following:

  1. Named expressions

  2. Output of the child physical operator as the input schema

  3. subexpressionEliminationEnabled flag

doExecute requests the UnsafeProjection to initialize and maps over the internal rows (of a partition) using the projection.

Creating ProjectExec Instance

ProjectExec takes the following when created:

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

Note
doConsume is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.

doConsume…​FIXME

赞(0) 打赏
未经允许不得转载:spark技术分享 » ProjectExec
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏