关注 spark技术分享,
撸spark源码 玩spark最佳实践

MapStatus — Shuffle Map Output Status

MapStatus — Shuffle Map Output Status

There are two types of MapStatus:

  • CompressedMapStatus that compresses the estimated map output size to 8 bits (Byte) for efficient reporting.

  • HighlyCompressedMapStatus that stores the average size of non-empty blocks, and a compressed bitmap for tracking which blocks are empty.

When the number of blocks (the size of uncompressedSizes) is greater than 2000, HighlyCompressedMapStatus is chosen.

Caution
FIXME What exactly is 2000? Is this the number of tasks in a job?

MapStatus Contract

Note
MapStatus is a private[spark] contract.
Table 1. MapStatus Contract
Method Description

location

The BlockManager where a ShuffleMapTask ran and the result is stored.

getSizeForBlock

The estimated size for the reduce block (in bytes).

赞(0) 打赏
未经允许不得转载:spark技术分享 » MapStatus — Shuffle Map Output Status
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏