关注 spark技术分享,
撸spark源码 玩spark最佳实践

ShuffledRowRDD

ShuffledRowRDD

ShuffledRowRDD is an RDD of internal binary rows (i.e. RDD[InternalRow]).

Note
ShuffledRowRDD looks like ShuffledRDD, and the difference is in the type of the values to process, i.e. InternalRow and (K, C) key-value pairs, respectively.

ShuffledRowRDD takes a ShuffleDependency (of integer keys and InternalRow values).

Note
The dependency property is mutable and is of type ShuffleDependency[Int, InternalRow, InternalRow].

ShuffledRowRDD takes an optional specifiedPartitionStartIndices collection of integers that is the number of post-shuffle partitions. When not specified, the number of post-shuffle partitions is managed by the Partitioner of the input ShuffleDependency.

Note
Post-shuffle partition is…​FIXME
Table 1. ShuffledRowRDD and RDD Contract
Name Description

getDependencies

A single-element collection with ShuffleDependency[Int, InternalRow, InternalRow].

partitioner

CoalescedPartitioner (with the Partitioner of the dependency)

getPreferredLocations

compute

numPreShufflePartitions Property

Caution
FIXME

Computing Partition (in TaskContext) — compute Method

Note
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext).

Internally, compute makes sure that the input split is a ShuffledRowRDDPartition. It then requests ShuffleManager for a ShuffleReader to read InternalRows for the split.

Note
compute uses ShuffleHandle (of ShuffleDependency dependency) and the pre-shuffle start and end partition offsets.

Getting Placement Preferences of Partition — getPreferredLocations Method

Note
getPreferredLocations is part of RDD contract to specify placement preferences (aka preferred task locations), i.e. where tasks should be executed to be as close to the data as possible.

Internally, getPreferredLocations requests MapOutputTrackerMaster for the preferred locations of the input partition (for the single ShuffleDependency).

Note
getPreferredLocations uses SparkEnv to access the current MapOutputTrackerMaster (which runs on the driver).

CoalescedPartitioner

Caution
FIXME

ShuffledRowRDDPartition

Caution
FIXME
赞(0) 打赏
未经允许不得转载:spark技术分享 » ShuffledRowRDD
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏