关注 spark技术分享,
撸spark源码 玩spark最佳实践

Learning Jobs and Partitions Using take Action

Exercise: Learning Jobs and Partitions Using take Action

The exercise aims for introducing take action and using spark-shell and web UI. It should introduce you to the concepts of partitions and jobs.

The following snippet creates an RDD of 16 elements with 16 partitions.

All 16 partitions have one element.

When you execute r1.take(1) only one job gets run since it is enough to compute one task on one partition.

Caution
FIXME Snapshot from web UI – note the number of tasks

However, when you execute r1.take(2) two jobs get run as the implementation assumes one job with one partition, and if the elements didn’t total to the number of elements requested in take, quadruple the partitions to work on in the following jobs.

Caution
FIXME Snapshot from web UI – note the number of tasks

Can you guess how many jobs are run for r1.take(15)? How many tasks per job?

Caution
FIXME Snapshot from web UI – note the number of tasks

Answer: 3.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Learning Jobs and Partitions Using take Action
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏