关注 spark技术分享,
撸spark源码 玩spark最佳实践

HashedRelation

HashedRelation

HashedRelation is the contract for “relations” with values hashed by some key.

HashedRelation is a KnownSizeEstimation.

Note
HashedRelation is a private[execution] contract.
Table 1. HashedRelation Contract
Method Description

asReadOnlyCopy

Gives a read-only copy of this HashedRelation to be safely used in a separate thread.

Used exclusively when BroadcastHashJoinExec is requested to execute (and transform every partitions of streamedPlan physical operator using the broadcast variable of buildPlan physical operator).

get

Gives internal rows for the given key or null

Used when HashJoin is requested to innerJoin, outerJoin, semiJoin, existenceJoin and antiJoin.

getValue

Gives the value internal row for a given key

Note
HashedRelation has two variants of getValue, i.e. one that accepts an InternalRow and another a Long. getValue with an InternalRow does not seem to be used at all.

getAverageProbesPerLookup

Used when…​FIXME

getValue Method

Note
This is getValue that takes a long key. There is the more generic getValue that takes an internal row instead.

getValue simply reports an UnsupportedOperationException (and expects concrete HashedRelations to provide a more meaningful implementation).

Note
getValue is used exclusively when LongHashedRelation is requested to get the value for a given key.

Creating Concrete HashedRelation Instance (for Build Side of Hash-based Join) — apply Factory Method

apply creates a LongHashedRelation when the input key collection has a single expression of type long or UnsafeHashedRelation otherwise.

Note

The input key expressions are:

Note

apply is used when:

赞(0) 打赏
未经允许不得转载:spark技术分享 » HashedRelation
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏