关注 spark技术分享,
撸spark源码 玩spark最佳实践

Dataset API vs SQL

Dataset API vs SQL

Spark SQL supports two “modes” to write structured queries: Dataset API and SQL.

It turns out that some structured queries can be expressed easier using Dataset API, but there are some that are only possible in SQL. In other words, you may find mixing Dataset API and SQL modes challenging yet rewarding.

You could at some point consider writing structured queries using Catalyst data structures directly hoping to avoid the differences and focus on what is supported in Spark SQL, but that could quickly become unwieldy for maintenance (i.e. finding Spark SQL developers who could be comfortable with it as well as being fairly low-level and therefore possibly too dependent on a specific Spark SQL version).

This section describes the differences between Spark SQL features to develop Spark applications using Dataset API and SQL mode.

  1. RuntimeReplaceable Expressions are only available using SQL mode by means of SQL functions like nvl, nvl2, ifnull, nullif, etc.

  2. Column.isin and SQL IN predicate with a subquery (and In Predicate Expression)

赞(0) 打赏
未经允许不得转载:spark技术分享 » Dataset API vs SQL
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏