关注 spark技术分享,
撸spark源码 玩spark最佳实践

ResultTask

ResultTask

ResultTask is created with a broadcast variable with the RDD and the function to execute it on and the partition.

Table 1. ResultTask’s Internal Registries and Counters
Name Description

preferredLocs

Collection of TaskLocations.

Corresponds directly to unique entries in locs with the only rule that when locs is not defined, it is empty, and no task location preferences are defined.

Initialized when ResultTask is created.

Used exclusively when ResultTask is requested for preferred locations.

Creating ResultTask Instance

ResultTask takes the following when created:

  • stageId — the stage the task is executed for

  • stageAttemptId — the stage attempt id

  • Broadcast variable with the serialized task (as Array[Byte]). The broadcast contains of a serialized pair of RDD and the function to execute.

  • Partition to compute

  • Collection of TaskLocations, i.e. preferred locations (executors) to execute the task on

  • outputId

  • local Properties

  • The stage’s serialized TaskMetrics (as Array[Byte])

  • (optional) Job id

  • (optional) Application id

  • (optional) Application attempt id

ResultTask initializes the internal registries and counters.

preferredLocations Method

Note
preferredLocations is part of Task contract.

preferredLocations simply returns preferredLocs internal property.

Deserialize RDD and Function (From Broadcast) and Execute Function (on RDD Partition) — runTask Method

Note
U is the type of a result as defined when ResultTask is created.

runTask deserializes a RDD and a function from the broadcast and then executes the function (on the records from the RDD partition).

Note
runTask is part of Task contract to run a task.

Internally, runTask starts by tracking the time required to deserialize a RDD and a function to execute.

Note
taskBinary broadcast is defined when ResultTask is created.

runTask records _executorDeserializeTime and _executorDeserializeCpuTime properties.

In the end, runTask executes the function (passing in the input context and the records from partition of the RDD).

Note
partition to use to access the records in a deserialized RDD is defined when ResultTask was created.
赞(0) 打赏
未经允许不得转载:spark技术分享 » ResultTask
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏