HashJoin — Contract for Hash-based Join Physical Operators
HashJoin
is the contract for hash-based join physical operators (e.g. BroadcastHashJoinExec and ShuffledHashJoinExec).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
package org.apache.spark.sql.execution.joins trait HashJoin { // only required methods that have no implementation // the others follow val leftKeys: Seq[Expression] val rightKeys: Seq[Expression] val joinType: JoinType val buildSide: BuildSide val condition: Option[Expression] val left: SparkPlan val right: SparkPlan } |
Method | Description |
---|---|
Left or right build side Used when:
|
|
Name | Description |
---|---|
Build join keys (as Catalyst expressions) |
|
Streamed join keys (as Catalyst expressions) |
|
join
Method
1 2 3 4 5 6 7 8 9 |
join( streamedIter: Iterator[InternalRow], hashed: HashedRelation, numOutputRows: SQLMetric, avgHashProbe: SQLMetric): Iterator[InternalRow] |
join
branches off per joinType to create a join iterator of internal rows (i.e. Iterator[InternalRow]
) for the input streamedIter
and hashed
:
-
outerJoin for a LeftOuter or a RightOuter join
-
existenceJoin for a ExistenceJoin join
join
requests TaskContext
to add a TaskCompletionListener
to update the input avg hash probe SQL metric. The TaskCompletionListener
is executed on a task completion (regardless of the task status: success, failure, or cancellation) and uses getAverageProbesPerLookup from the input hashed
to set the input avg hash probe.
join
createResultProjection.
In the end, for every row in the join iterator of internal rows join
increments the input numOutputRows
SQL metric and applies the result projection.
join
reports a IllegalArgumentException
when the joinType is incorrect.
1 2 3 4 5 |
[x] JoinType is not supported |
Note
|
join is used when BroadcastHashJoinExec and ShuffledHashJoinExec are executed.
|
innerJoin
Internal Method
1 2 3 4 5 6 7 |
innerJoin( streamIter: Iterator[InternalRow], hashedRelation: HashedRelation): Iterator[InternalRow] |
innerJoin
…FIXME
Note
|
innerJoin is used when…FIXME
|
outerJoin
Internal Method
1 2 3 4 5 6 7 |
outerJoin( streamedIter: Iterator[InternalRow], hashedRelation: HashedRelation): Iterator[InternalRow] |
outerJoin
…FIXME
Note
|
outerJoin is used when…FIXME
|
semiJoin
Internal Method
1 2 3 4 5 6 7 |
semiJoin( streamIter: Iterator[InternalRow], hashedRelation: HashedRelation): Iterator[InternalRow] |
semiJoin
…FIXME
Note
|
semiJoin is used when…FIXME
|
antiJoin
Internal Method
1 2 3 4 5 6 7 |
antiJoin( streamIter: Iterator[InternalRow], hashedRelation: HashedRelation): Iterator[InternalRow] |
antiJoin
…FIXME
Note
|
antiJoin is used when…FIXME
|