关注 spark技术分享,
撸spark源码 玩spark最佳实践

StorageLevel

StorageLevel

StorageLevel describes how an RDD is persisted (and addresses the following concerns):

  • Does RDD use disk?

  • Does RDD use memory to store data?

  • How much of RDD is in memory?

  • Does RDD use off-heap memory?

  • Should an RDD be serialized or not (while storing the data)?

  • How many replicas (default: 1) to use (can only be less than 40)?

There are the following StorageLevel (number _2 in the name denotes 2 replicas):

  • NONE (default)

  • DISK_ONLY

  • DISK_ONLY_2

  • MEMORY_ONLY (default for cache operation for RDDs)

  • MEMORY_ONLY_2

  • MEMORY_ONLY_SER

  • MEMORY_ONLY_SER_2

  • MEMORY_AND_DISK

  • MEMORY_AND_DISK_2

  • MEMORY_AND_DISK_SER

  • MEMORY_AND_DISK_SER_2

  • OFF_HEAP

You can check out the storage level using getStorageLevel() operation.

StorageLevel can indicate to use memory for data storage using useMemory flag.

StorageLevel can indicate to use disk for data storage using useDisk flag.

StorageLevel can indicate to store data in deserialized format using deserialized flag.

StorageLevel can indicate to replicate the data to other block managers using replication property.

赞(0) 打赏
未经允许不得转载:spark技术分享 » StorageLevel
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏