关注 spark技术分享,
撸spark源码 玩spark最佳实践

BroadcastHashJoinExec

BroadcastHashJoinExec Binary Physical Operator for Broadcast Hash Join

BroadcastHashJoinExec is a binary physical operator to perform a broadcast hash join.

BroadcastHashJoinExec is created after applying JoinSelection execution planning strategy to ExtractEquiJoinKeys-destructurable logical query plans (i.e. INNER, CROSS, LEFT OUTER, LEFT SEMI, LEFT ANTI) of which the right physical operator can be broadcast.

BroadcastHashJoinExec supports Java code generation (aka codegen).

BroadcastHashJoinExec requires that partition requirements for the two children physical operators match BroadcastDistribution (with a HashedRelationBroadcastMode) and UnspecifiedDistribution (for left and right sides of a join or vice versa).

Table 1. BroadcastHashJoinExec’s Performance Metrics
Key Name (in web UI) Description

numOutputRows

number of output rows

avgHashProbe

avg hash probe

spark sql BroadcastHashJoinExec webui query details.png
Figure 1. BroadcastHashJoinExec in web UI (Details for Query)
Note
The prefix for variable names for BroadcastHashJoinExec operators in CodegenSupport-generated code is bhj.

Table 2. BroadcastHashJoinExec’s Required Child Output Distributions
BuildSide Left Child Right Child

BuildLeft

BroadcastDistribution with HashedRelationBroadcastMode broadcast mode of build join keys

UnspecifiedDistribution

BuildRight

UnspecifiedDistribution

BroadcastDistribution with HashedRelationBroadcastMode broadcast mode of build join keys

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Note
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute…​FIXME

Generating Java Source Code for Inner Join — codegenInner Internal Method

codegenInner…​FIXME

Note
codegenInner is used exclusively when BroadcastHashJoinExec is requested to generate the Java code for the “consume” path in whole-stage code generation.

Generating Java Source Code for Left or Right Outer Join — codegenOuter Internal Method

codegenOuter…​FIXME

Note
codegenOuter is used exclusively when BroadcastHashJoinExec is requested to generate the Java code for the “consume” path in whole-stage code generation.

Generating Java Source Code for Left Semi Join — codegenSemi Internal Method

codegenSemi…​FIXME

Note
codegenSemi is used exclusively when BroadcastHashJoinExec is requested to generate the Java code for the “consume” path in whole-stage code generation.

Generating Java Source Code for Anti Join — codegenAnti Internal Method

codegenAnti…​FIXME

Note
codegenAnti is used exclusively when BroadcastHashJoinExec is requested to generate the Java code for the “consume” path in whole-stage code generation.

codegenExistence Internal Method

codegenExistence…​FIXME

Note
codegenExistence is used exclusively when BroadcastHashJoinExec is requested to generate the Java code for the “consume” path in whole-stage code generation.

genStreamSideJoinKey Internal Method

genStreamSideJoinKey…​FIXME

Note
genStreamSideJoinKey is used when BroadcastHashJoinExec is requested to generate the Java source code for inner, outer, left semi, anti and existence joins (for the “consume” path in whole-stage code generation).

Creating BroadcastHashJoinExec Instance

BroadcastHashJoinExec takes the following when created:

赞(0) 打赏
未经允许不得转载:spark技术分享 » BroadcastHashJoinExec
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏