关注 spark技术分享,
撸spark源码 玩spark最佳实践

Dataset Caching and Persistence

Dataset Caching and Persistence

Table 1. Caching Operators (Basic Actions)
Operator Description

cache

Basic action to cache a Dataset

persist

Basic action to persist a Dataset

unpersist

Basic action to unpersist a cached Dataset

Note

You can also use SQL’s CACHE TABLE [tableName] to cache tableName table in memory. Unlike cache and persist operators, CACHE TABLE is an eager operation which is executed as soon as the statement is executed.

You could however use LAZY keyword to make caching lazy.

Use SQL’s REFRESH TABLE [tableName] to refresh a cached table.

Use SQL’s UNCACHE TABLE (IF EXISTS)? [tableName] to remove a table from cache.

Use SQL’s CLEAR CACHE to remove all tables from cache.

Note

Be careful what you cache, i.e. what Dataset is cached, as it gives different queries cached.

Tip

You can check whether a Dataset was cached or not using the following code:

SQL’s CACHE TABLE

SQL’s CACHE TABLE corresponds to requesting the session-specific Catalog to caching the table.

Internally, CACHE TABLE becomes CacheTableCommand runnable command that…​FIXME

赞(0) 打赏
未经允许不得转载:spark技术分享 » Dataset Caching and Persistence
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏