关注 spark技术分享,
撸spark源码 玩spark最佳实践

TungstenAggregationIterator — Iterator of UnsafeRows for HashAggregateExec Physical Operator

TungstenAggregationIterator — Iterator of UnsafeRows for HashAggregateExec Physical Operator

TungstenAggregationIterator is a AggregationIterator that the HashAggregateExec aggregate physical operator uses when executed (to process UnsafeRows per partition and calculate aggregations).

TungstenAggregationIterator prefers hash-based aggregation (before switching to sort-based aggregation).

When created, TungstenAggregationIterator gets SQL metrics from the HashAggregateExec aggregate physical operator being executed, i.e. numOutputRows, peakMemory, spillSize and avgHashProbe metrics.

The metrics are then displayed as part of HashAggregateExec aggregate physical operator (e.g. in web UI in Details for Query).

spark sql HashAggregateExec webui details for query.png
Figure 1. HashAggregateExec in web UI (Details for Query)
Table 1. TungstenAggregationIterator’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

aggregationBufferMapIterator

KVIterator[UnsafeRow, UnsafeRow]

Used when…​FIXME

hashMap

UnsafeFixedWidthAggregationMap with the following:

Used when TungstenAggregationIterator is requested for the next UnsafeRow, to outputForEmptyGroupingKeyWithoutInput, processInputs, to initialize the aggregationBufferMapIterator and every time a partition has been processed.

initialAggregationBuffer

UnsafeRow that is the aggregation buffer containing initial buffer values.

Used when…​FIXME

externalSorter

UnsafeKVExternalSorter used for sort-based aggregation

sortBased

Flag to indicate whether TungstenAggregationIterator uses sort-based aggregation (not hash-based aggregation).

sortBased flag is disabled (false) by default.

Enabled (true) when TungstenAggregationIterator is requested to switch to sort-based aggregation.

Used when…​FIXME

processInputs Internal Method

processInputs…​FIXME

Note
processInputs is used exclusively when TungstenAggregationIterator is created (and sets the internal flags to indicate whether to use a hash-based aggregation or, in the worst case, a sort-based aggregation when there is not enough memory for groups and their buffers).

Switching to Sort-Based Aggregation (From Preferred Hash-Based Aggregation) — switchToSortBasedAggregation Internal Method

switchToSortBasedAggregation…​FIXME

Note
switchToSortBasedAggregation is used exclusively when TungstenAggregationIterator is requested to processInputs (and the externalSorter is used).

Getting Next UnsafeRow — next Method

Note
next is part of Scala’s scala.collection.Iterator interface that returns the next element and discards it from the iterator.

next…​FIXME

hasNext Method

Note
hasNext is part of Scala’s scala.collection.Iterator interface that tests whether this iterator can provide another element.

hasNext…​FIXME

Creating TungstenAggregationIterator Instance

TungstenAggregationIterator takes the following when created:

Note
The SQL metrics (numOutputRows, peakMemory, spillSize and avgHashProbe) belong to the HashAggregateExec physical operator that created the TungstenAggregationIterator.

TungstenAggregationIterator initializes the internal registries and counters.

TungstenAggregationIterator starts processing input rows and pre-loads the first key-value pair from the UnsafeFixedWidthAggregationMap if did not switch to sort-based aggregation.

generateResultProjection Method

Note
generateResultProjection is part of the AggregationIterator Contract to…​FIXME.

generateResultProjection…​FIXME

Creating UnsafeRow — outputForEmptyGroupingKeyWithoutInput Method

outputForEmptyGroupingKeyWithoutInput…​FIXME

Note
outputForEmptyGroupingKeyWithoutInput is used when…​FIXME

TaskCompletionListener

TungstenAggregationIterator registers a TaskCompletionListener that is executed on task completion (for every task that processes a partition).

When executed (once per partition), the TaskCompletionListener updates the following metrics:

赞(0) 打赏
未经允许不得转载:spark技术分享 » TungstenAggregationIterator — Iterator of UnsafeRows for HashAggregateExec Physical Operator
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏