关注 spark技术分享,
撸spark源码 玩spark最佳实践

Subqueries

Subqueries (Subquery Expressions)

As of Spark 2.0, Spark SQL supports subqueries.

A subquery (aka subquery expression) is a query that is nested inside of another query.

There are the following kinds of subqueries:

  1. A subquery as a source (inside a SQL FROM clause)

  2. A scalar subquery or a predicate subquery (as a column)

Every subquery can also be correlated or uncorrelated.

A scalar subquery is a structured query that returns a single row and a single column only. Spark SQL uses ScalarSubquery (SubqueryExpression) expression to represent scalar subqueries (while parsing a SQL statement).

A ScalarSubquery expression appears as scalar-subquery#[exprId] [conditionString] in a logical plan.

It is said that scalar subqueries should be used very rarely if at all and you should join instead.

Spark Analyzer uses ResolveSubquery resolution rule to resolve subqueries and at the end makes sure that they are valid.

Catalyst Optimizer uses the following optimizations for subqueries:

Spark Physical Optimizer uses PlanSubqueries physical optimization to plan queries with scalar subqueries.

Caution
FIXME Describe how a physical ScalarSubquery is executed (cf. updateResult, eval and doGenCode).
赞(0) 打赏
未经允许不得转载:spark技术分享 » Subqueries
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏