关注 spark技术分享,
撸spark源码 玩spark最佳实践

GenerateExec

GenerateExec Unary Physical Operator

GenerateExec is a unary physical operator (i.e. with one child physical operator) that is created exclusively when BasicOperators execution planning strategy is requested to resolve a Generate logical operator.

When executed, GenerateExec executes (aka evaluates) the Generator expression on every row in a RDD partition.

spark sql GenerateExec doExecute.png
Figure 1. GenerateExec’s Execution — doExecute Method
Note
child physical operator has to support CodegenSupport.

GenerateExec supports Java code generation (aka codegen).

GenerateExec does not support Java code generation (aka whole-stage codegen), i.e. supportCodegen flag is turned off.

The output schema of a GenerateExec is…​FIXME

Table 1. GenerateExec’s Performance Metrics
Key Name (in web UI) Description

numOutputRows

number of output rows

spark sql GenerateExec webui details for query.png
Figure 2. GenerateExec in web UI (Details for Query)

producedAttributes…​FIXME

outputPartitioning…​FIXME

boundGenerator…​FIXME

GenerateExec gives child‘s input RDDs (when WholeStageCodegenExec is executed).

GenerateExec requires that…​FIXME

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce Method

Note
doProduce is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.

doProduce…​FIXME

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

Note
doConsume is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.

doConsume…​FIXME

codeGenCollection Internal Method

codeGenCollection…​FIXME

Note
codeGenCollection is used exclusively when GenerateExec is requested to generate the Java code for the “consume” path in whole-stage code generation (when Generator is a CollectionGenerator).

codeGenTraversableOnce Internal Method

codeGenTraversableOnce…​FIXME

Note
codeGenTraversableOnce is used exclusively when GenerateExec is requested to generate the Java code for the consume path in whole-stage code generation (when Generator is not a CollectionGenerator).

codeGenAccessor Internal Method

codeGenAccessor…​FIXME

Note
codeGenAccessor is used…​FIXME

Creating GenerateExec Instance

GenerateExec takes the following when created:

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Note
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute…​FIXME

赞(0) 打赏
未经允许不得转载:spark技术分享 » GenerateExec
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏