关注 spark技术分享,
撸spark源码 玩spark最佳实践

Whole-Stage Java Code Generation (Whole-Stage CodeGen)

Whole-Stage Java Code Generation (Whole-Stage CodeGen)

Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.

Whole-Stage Java Code Generation improves the execution performance of a query by collapsing a query tree into a single optimized function that eliminates virtual function calls and leverages CPU registers for intermediate data.

Note

Whole-Stage Code Generation is controlled by spark.sql.codegen.wholeStage Spark internal property.

Whole-Stage Code Generation is enabled by default.

Use SQLConf.wholeStageEnabled method to access the current value.

Note

Whole-Stage Code Generation is used by some modern massively parallel processing (MPP) databases to achieve a better query execution performance.

Note
Janino is used to compile a Java source code into a Java class at runtime.

Before a query is executed, CollapseCodegenStages physical preparation rule finds the physical query plans that support codegen and collapses them together as WholeStageCodegen (possibly with InputAdapter in-between for physical operators with no support for Java code generation).

Note
CollapseCodegenStages is part of the sequence of physical preparation rules QueryExecution.preparations that will be applied in order to the physical plan before execution.

There are the following code generation paths (as coined in this commit):

  1. Non-whole-stage-codegen path

  1. Whole-stage-codegen “produce” path

  1. Whole-stage-codegen “consume” path

Tip
Review SPARK-12795 Whole stage codegen to learn about the work to support it.

BenchmarkWholeStageCodegen — Performance Benchmark

BenchmarkWholeStageCodegen class provides a benchmark to measure whole stage codegen performance.

You can execute it using the command:

Note
You need to un-ignore tests in BenchmarkWholeStageCodegen by replacing ignore with test.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Whole-Stage Java Code Generation (Whole-Stage CodeGen)
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏