关注 spark技术分享,
撸spark源码 玩spark最佳实践

WindowExec

WindowExec Unary Physical Operator

WindowExec is a unary physical operator (i.e. with one child physical operator) for window aggregation execution that represents Window unary logical operator at execution.

WindowExec is created exclusively when BasicOperators execution planning strategy resolves a Window unary logical operator.

spark sql WindowExec webui query details.png
Figure 1. WindowExec in web UI (Details for Query)

The output schema of WindowExec are the attributes of child physical operator and window expressions.

Table 1. WindowExec’s Required Child Output Distribution
Single Child

ClusteredDistribution (per window partition specifications expressions)

If no window partition specification is specified, WindowExec prints out the following WARN message to the logs (and the child’s distribution requirement is AllTuples):

Tip

Enable WARN logging level for org.apache.spark.sql.execution.WindowExec logger to see what happens inside.

Add the following line to conf/log4j.properties:

Refer to Logging.

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Note
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute executes the single child physical operator and maps over partitions using a custom Iterator[InternalRow].

Note
When executed, doExecute creates a MapPartitionsRDD with the child physical operator’s RDD[InternalRow].

Internally, doExecute first takes WindowExpressions and their WindowFunctionFrame factory functions (from window frame factories) followed by executing the single child physical operator and mapping over partitions (using RDD.mapPartitions operator).

doExecute creates an Iterator[InternalRow] (of UnsafeRow exactly).

Mapping Over UnsafeRows per Partition — Iterator[InternalRow]

When created, Iterator[InternalRow] first creates two UnsafeProjection conversion functions (to convert InternalRows to UnsafeRows) as result and grouping.

Note
grouping conversion function is created for window partition specifications expressions and used exclusively to create nextGroup when Iterator[InternalRow] is requested next row.
Tip

Enable DEBUG logging level for org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator logger to see the code generated for grouping conversion function.

Add the following line to conf/log4j.properties:

Refer to Logging.

Iterator[InternalRow] then fetches the first row from the upstream RDD and initializes nextRow and nextGroup UnsafeRows.

Note
nextGroup is the result of converting nextRow using grouping conversion function.

doExecute creates a ExternalAppendOnlyUnsafeRowArray buffer using spark.sql.windowExec.buffer.spill.threshold property (default: 4096) as the threshold for the number of rows buffered.

doExecute creates a SpecificInternalRow for the window function result (as windowFunctionResult).

Note
SpecificInternalRow is also used in the generated code for the UnsafeProjection for the result.

doExecute takes the window frame factories and generates WindowFunctionFrame per factory (using the SpecificInternalRow created earlier).

Caution
FIXME
Note
ExternalAppendOnlyUnsafeRowArray is used to collect UnsafeRow objects from the child’s partitions (one partition per buffer and up to spark.sql.windowExec.buffer.spill.threshold).

next Method

Note
next is part of Scala’s scala.collection.Iterator interface that returns the next element and discards it from the iterator.

next method of the final Iterator is…​FIXME

next first fetches a new partition, but only when…​FIXME

Note
next loads all the rows in nextGroup.
Caution
FIXME What’s nextGroup?

next takes one UnsafeRow from bufferIterator.

Caution
FIXME bufferIterator seems important for the iteration.

next then requests every WindowFunctionFrame to write the current rowIndex and UnsafeRow.

Caution
FIXME rowIndex?

next joins the current UnsafeRow and windowFunctionResult (i.e. takes two InternalRows and makes them appear as a single concatenated InternalRow).

next increments rowIndex.

In the end, next uses the UnsafeProjection function (that was created using createResultProjection) and projects the joined InternalRow to the result UnsafeRow.

Fetching All Rows In Partition — fetchNextPartition Internal Method

fetchNextPartition first copies the current nextGroup UnsafeRow (that was created using grouping projection function) and clears the internal buffer.

fetchNextPartition then collects all UnsafeRows for the current nextGroup in buffer.

With the buffer filled in (with UnsafeRows per partition), fetchNextPartition prepares every WindowFunctionFrame function in frames one by one (and passing buffer).

In the end, fetchNextPartition resets rowIndex to 0 and requests buffer to generate an iterator (available as bufferIterator).

Note
fetchNextPartition is used internally when doExecute‘s Iterator is requested for the next UnsafeRow (when bufferIterator is uninitialized or was drained, i.e. holds no elements, but there are still rows in the upstream operator’s partition).

fetchNextRow Internal Method

fetchNextRow checks whether there is the next row available (using the upstream Iterator.hasNext) and sets nextRowAvailable mutable internal flag.

If there is a row available, fetchNextRow sets nextRow internal variable to the next UnsafeRow from the upstream’s RDD.

fetchNextRow also sets nextGroup internal variable as an UnsafeRow for nextRow using grouping function.

Note

grouping is a UnsafeProjection function that is created for window partition specifications expressions to be bound to the single child‘s output schema.

grouping uses GenerateUnsafeProjection to canonicalize the bound expressions and create the UnsafeProjection function.

If no row is available, fetchNextRow nullifies nextRow and nextGroup internal variables.

Note
fetchNextRow is used internally when doExecute‘s Iterator is created and fetchNextPartition is called.

createResultProjection Internal Method

createResultProjection creates a UnsafeProjection function for expressions window function Catalyst expressions so that the window expressions are on the right side of child’s output.

Note
UnsafeProjection is a Scala function that produces UnsafeRow for an InternalRow.

Internally, createResultProjection first creates a translation table with a BoundReference per expression (in the input expressions).

Note
BoundReference is a Catalyst expression that is a reference to a value in internal binary row at a specified position and of specified data type.

createResultProjection then creates a window function bound references for window expressions so unbound expressions are transformed to the BoundReferences.

In the end, createResultProjection creates a UnsafeProjection with:

  • exprs expressions from child‘s output and the collection of window function bound references

  • inputSchema input schema per child‘s output

Note
createResultProjection is used exclusively when WindowExec is executed.

Creating WindowExec Instance

WindowExec takes the following when created:

Lookup Table for WindowExpressions and Factory Functions for WindowFunctionFrame — windowFrameExpressionFactoryPairs Lazy Value

windowFrameExpressionFactoryPairs is a lookup table with window expressions and factory functions for WindowFunctionFrame (per key-value pair in framedFunctions lookup table).

A factory function is a function that takes an InternalRow and produces a WindowFunctionFrame (described in the table below)

Internally, windowFrameExpressionFactoryPairs first builds framedFunctions lookup table with 4-element tuple keys and 2-element expression list values (described in the table below).

windowFrameExpressionFactoryPairs finds WindowExpression expressions in the input windowExpression and for every WindowExpression takes the window frame specification (of type SpecifiedWindowFrame that is used to find frame type and start and end frame positions).

Table 2. framedFunctions’s FrameKey — 4-element Tuple for Frame Keys (in positional order)
Element Description

Name of the kind of function

FrameType

RangeFrame or RowFrame

Window frame’s start position

  • Positive number for CurrentRow (0) and ValueFollowing

  • Negative number for ValuePreceding

  • Empty when unspecified

Window frame’s end position

  • Positive number for CurrentRow (0) and ValueFollowing

  • Negative number for ValuePreceding

  • Empty when unspecified

Table 3. framedFunctions’s 2-element Tuple Values (in positional order)
Element Description

Collection of window expressions

WindowExpression

Collection of window functions

windowFrameExpressionFactoryPairs creates a AggregateProcessor for AGGREGATE frame keys in framedFunctions lookup table.

Table 4. windowFrameExpressionFactoryPairs’ Factory Functions (in creation order)
Frame Name FrameKey WindowFunctionFrame

Offset Frame

("OFFSET", RowFrame, Some(offset), Some(h))

OffsetWindowFunctionFrame

Growing Frame

("AGGREGATE", frameType, None, Some(high))

UnboundedPrecedingWindowFunctionFrame

Shrinking Frame

("AGGREGATE", frameType, Some(low), None)

UnboundedFollowingWindowFunctionFrame

Moving Frame

("AGGREGATE", frameType, Some(low), Some(high))

SlidingWindowFunctionFrame

Entire Partition Frame

("AGGREGATE", frameType, None, None)

UnboundedWindowFunctionFrame

Note
lazy val in Scala is computed when first accessed and once only (for the entire lifetime of the owning object instance).
Note
windowFrameExpressionFactoryPairs is used exclusively when WindowExec is executed.

createBoundOrdering Internal Method

createBoundOrdering…​FIXME

Note
createBoundOrdering is used exclusively when WindowExec physical operator is requested for the window frame factories.
赞(0) 打赏
未经允许不得转载:spark技术分享 » WindowExec
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏