HashJoin — Contract for Hash-based Join Physical Operators
HashJoin is the contract for hash-based join physical operators (e.g. BroadcastHashJoinExec and ShuffledHashJoinExec).
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
package org.apache.spark.sql.execution.joins trait HashJoin { // only required methods that have no implementation // the others follow val leftKeys: Seq[Expression] val rightKeys: Seq[Expression] val joinType: JoinType val buildSide: BuildSide val condition: Option[Expression] val left: SparkPlan val right: SparkPlan } |
| Method | Description |
|---|---|
|
Left or right build side Used when:
|
|
| Name | Description |
|---|---|
|
Build join keys (as Catalyst expressions) |
|
|
Streamed join keys (as Catalyst expressions) |
|
join Method
|
1 2 3 4 5 6 7 8 9 |
join( streamedIter: Iterator[InternalRow], hashed: HashedRelation, numOutputRows: SQLMetric, avgHashProbe: SQLMetric): Iterator[InternalRow] |
join branches off per joinType to create a join iterator of internal rows (i.e. Iterator[InternalRow]) for the input streamedIter and hashed:
-
outerJoin for a LeftOuter or a RightOuter join
-
existenceJoin for a ExistenceJoin join
join requests TaskContext to add a TaskCompletionListener to update the input avg hash probe SQL metric. The TaskCompletionListener is executed on a task completion (regardless of the task status: success, failure, or cancellation) and uses getAverageProbesPerLookup from the input hashed to set the input avg hash probe.
join createResultProjection.
In the end, for every row in the join iterator of internal rows join increments the input numOutputRows SQL metric and applies the result projection.
join reports a IllegalArgumentException when the joinType is incorrect.
|
1 2 3 4 5 |
[x] JoinType is not supported |
|
Note
|
join is used when BroadcastHashJoinExec and ShuffledHashJoinExec are executed.
|
innerJoin Internal Method
|
1 2 3 4 5 6 7 |
innerJoin( streamIter: Iterator[InternalRow], hashedRelation: HashedRelation): Iterator[InternalRow] |
innerJoin…FIXME
|
Note
|
innerJoin is used when…FIXME
|
outerJoin Internal Method
|
1 2 3 4 5 6 7 |
outerJoin( streamedIter: Iterator[InternalRow], hashedRelation: HashedRelation): Iterator[InternalRow] |
outerJoin…FIXME
|
Note
|
outerJoin is used when…FIXME
|
semiJoin Internal Method
|
1 2 3 4 5 6 7 |
semiJoin( streamIter: Iterator[InternalRow], hashedRelation: HashedRelation): Iterator[InternalRow] |
semiJoin…FIXME
|
Note
|
semiJoin is used when…FIXME
|
antiJoin Internal Method
|
1 2 3 4 5 6 7 |
antiJoin( streamIter: Iterator[InternalRow], hashedRelation: HashedRelation): Iterator[InternalRow] |
antiJoin…FIXME
|
Note
|
antiJoin is used when…FIXME
|
spark技术分享