关注 spark技术分享,
撸spark源码 玩spark最佳实践

SubqueryExec

SubqueryExec Unary Physical Operator

SubqueryExec is a unary physical operator (i.e. with one child physical operator) that…​FIXME

SubqueryExec uses relationFuture that is lazily and executed only once when SubqueryExec is first requested to prepare execution that simply triggers execution of the child operator asynchronously (i.e. on a separate thread) and to collect the result soon after (that makes SubqueryExec waiting indefinitely for the child operator to be finished).

Caution
FIXME When is doPrepare executed?

SubqueryExec is created exclusively when PlanSubqueries preparation rule is executed (and transforms ScalarSubquery expressions in a physical plan).

Table 1. SubqueryExec’s Performance Metrics
Key Name (in web UI) Description

collectTime

time to collect (ms)

dataSize

data size (bytes)

spark sql SubqueryExec webui details for query.png
Figure 1. SubqueryExec in web UI (Details for Query)
Note
SubqueryExec physical operator is almost an exact copy of BroadcastExchangeExec physical operator.

Executing Child Operator Asynchronously — doPrepare Method

Note
doPrepare is part of SparkPlan Contract to prepare a physical operator for execution.

doPrepare simply triggers initialization of the internal lazily-once-initialized relationFuture asynchronous computation.

relationFuture Internal Lazily-Once-Initialized Property

When “materialized” (aka executed), relationFuture spawns a new thread of execution that requests SQLExecution to execute an action (with the current execution id) on subquery daemon cached thread pool.

Note
relationFuture uses Scala’s scala.concurrent.Future that spawns a new thread of execution once instantiated.

The action tracks execution of the child physical operator to executeCollect and collects collectTime and dataSize SQL metrics.

In the end, relationFuture posts metric updates and returns the internal rows.

Note
relationFuture is executed on a separate thread from a custom scala.concurrent.ExecutionContext (built from a cached java.util.concurrent.ThreadPoolExecutor with the prefix subquery and up to 16 threads).
Note
relationFuture is used when SubqueryExec is requested to prepare for execution (that triggers execution of the child operator) and execute collect (that waits indefinitely until the child operator has finished).

Creating SubqueryExec Instance

SubqueryExec takes the following when created:

Collecting Internal Rows of Executing SubqueryExec Operator — executeCollect Method

Note
executeCollect is part of SparkPlan Contract to execute a physical operator and collect the results as collection of internal rows.

executeCollect waits till relationFuture gives a result (as a Array[InternalRow]).

赞(0) 打赏
未经允许不得转载:spark技术分享 » SubqueryExec
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏