spark-sql-spark技术分享-第7页

UnsafeFixedWidthAggregationMap

2013-06-01admin阅读(1852)

UnsafeFixedWidthAggregationMap

UnsafeFixedWidthAggregationMap is a tiny layer (extension) around Spark Core’s BytesToBytesMap to allow for UnsafeRow keys and values.

Whenever requested for performance metrics (i.e. average number of probes per key lookup and peak memory used), UnsafeFixedWidthAggregationMap simply requests the underlying BytesToBytesMap.

UnsafeFixedWidthAggregationMap is created when:

HashAggregateExec physical operator is requested to create a new UnsafeFixedWidthAggregationMap (when HashAggregateExec physical operator is requested to generate the Java source code for “produce” path in Whole-Stage Code Generation)
TungstenAggregationIterator is created (when HashAggregateExec physical operator is requested to execute in traditional / non-Whole-Stage-Code-Generation execution path)

Table 1. UnsafeFixedWidthAggregationMap’s Internal Properties (e.g. Registries, Counters and Flags)
Name	Description
`currentAggregationBuffer`	Re-used pointer (as an UnsafeRow with the number of fields to match the aggregationBufferSchema) to the current aggregation buffer Used exclusively when `UnsafeFixedWidthAggregationMap` is requested to getAggregationBufferFromUnsafeRow.
`emptyAggregationBuffer`	Empty aggregation buffer (encoded in UnsafeRow format)
`groupingKeyProjection`	UnsafeProjection for the groupingKeySchema (to encode grouping keys as UnsafeRows)
`map`	Spark Core’s `BytesToBytesMap` with the taskMemoryManager, initialCapacity, pageSizeBytes and performance metrics enabled

`supportsAggregationBufferSchema` Static Method



boolean supportsAggregationBufferSchema(StructType schema)

1

2

3

4

5

boolean supportsAggregationBufferSchema(StructType schema)

supportsAggregationBufferSchema is a predicate that is enabled (true) unless there is a field (in the fields of the input schema) whose data type is not mutable.

Note

The mutable data types: BooleanType, ByteType, DateType, DecimalType, DoubleType, FloatType, IntegerType, LongType, NullType, ShortType and TimestampType.

Examples (possibly all) of data types that are not mutable: ArrayType, BinaryType, StringType, CalendarIntervalType, MapType, ObjectType and StructType.



import org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap

import org.apache.spark.sql.types._
val schemaWithImmutableField = StructType(StructField("string", StringType) :: Nil)
assert(UnsafeFixedWidthAggregationMap.supportsAggregationBufferSchema(schemaWithImmutableField) == false)

val schemaWithMutableFields = StructType(
  StructField("int", IntegerType) :: StructField("bool", BooleanType) :: Nil)
assert(UnsafeFixedWidthAggregationMap.supportsAggregationBufferSchema(schemaWithMutableFields))

1

2

3

4

5

6

7

8

9

10

11

12

13

import org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap

import org.apache.spark.sql.types._

val schemaWithImmutableField = StructType(StructField("string", StringType) :: Nil)

assert(UnsafeFixedWidthAggregationMap.supportsAggregationBufferSchema(schemaWithImmutableField) == false)

val schemaWithMutableFields = StructType(

StructField("int", IntegerType) :: StructField("bool", BooleanType) :: Nil)

assert(UnsafeFixedWidthAggregationMap.supportsAggregationBufferSchema(schemaWithMutableFields))

Note	`supportsAggregationBufferSchema` is used exclusively when `HashAggregateExec` is requested to supportsAggregate.

Creating UnsafeFixedWidthAggregationMap Instance

UnsafeFixedWidthAggregationMap takes the following when created:

Empty aggregation buffer (as an InternalRow)
Aggregation buffer schema
Grouping key schema
Spark Core’s TaskMemoryManager
Initial capacity
Page size (in bytes)

UnsafeFixedWidthAggregationMap initializes the internal registries and counters.

`getAggregationBufferFromUnsafeRow` Method



UnsafeRow getAggregationBufferFromUnsafeRow(UnsafeRow key) (1)
UnsafeRow getAggregationBufferFromUnsafeRow(UnsafeRow key, int hash)

1

2

3

4

5

6

UnsafeRow getAggregationBufferFromUnsafeRow(UnsafeRow key) (1)

UnsafeRow getAggregationBufferFromUnsafeRow(UnsafeRow key, int hash)

Uses the hash code of the key

getAggregationBufferFromUnsafeRow requests the BytesToBytesMap to lookup the input key (to get a BytesToBytesMap.Location).

getAggregationBufferFromUnsafeRow…FIXME

Note	`getAggregationBufferFromUnsafeRow` is used when: `TungstenAggregationIterator` is requested to processInputs (exclusively when `TungstenAggregationIterator` is created) (for testing only) `UnsafeFixedWidthAggregationMap` is requested to getAggregationBuffer

`getAggregationBuffer` Method



UnsafeRow getAggregationBuffer(InternalRow groupingKey)

1

2

3

4

5

UnsafeRow getAggregationBuffer(InternalRow groupingKey)

getAggregationBuffer…FIXME

Note	`getAggregationBuffer` seems to be used exclusively for testing.

Getting KVIterator — `iterator` Method



KVIterator<UnsafeRow, UnsafeRow> iterator()

1

2

3

4

5

KVIterator<UnsafeRow, UnsafeRow> iterator()

iterator…FIXME

Note	`iterator` is used when: `HashAggregateExec` physical operator is requested to finishAggregate `TungstenAggregationIterator` is created (and pre-loads the first key-value pair from the map)

`getPeakMemoryUsedBytes` Method



long getPeakMemoryUsedBytes()

1

2

3

4

5

long getPeakMemoryUsedBytes()

getPeakMemoryUsedBytes…FIXME

Note	`getPeakMemoryUsedBytes` is used when: `HashAggregateExec` physical operator is requested to finishAggregate `TungstenAggregationIterator` is used in TaskCompletionListener

`getAverageProbesPerLookup` Method



double getAverageProbesPerLookup()

1

2

3

4

5

double getAverageProbesPerLookup()

getAverageProbesPerLookup…FIXME

Note	`getAverageProbesPerLookup` is used when: `HashAggregateExec` physical operator is requested to finishAggregate `TungstenAggregationIterator` is used in TaskCompletionListener

`free` Method



void free()

1

2

3

4

5

void free()

free…FIXME

Note	`free` is used when: `HashAggregateExec` physical operator is requested to finishAggregate `TungstenAggregationIterator` is requested to processInputs (when `TungstenAggregationIterator` is created), get the next UnsafeRow, outputForEmptyGroupingKeyWithoutInput and is created

`destructAndCreateExternalSorter` Method



UnsafeKVExternalSorter destructAndCreateExternalSorter() throws IOException

1

2

3

4

5

UnsafeKVExternalSorter destructAndCreateExternalSorter() throws IOException

destructAndCreateExternalSorter…FIXME

Note	`destructAndCreateExternalSorter` is used when: `HashAggregateExec` physical operator is requested to finishAggregate `TungstenAggregationIterator` is requested to processInputs (when `TungstenAggregationIterator` is created)

ExternalAppendOnlyUnsafeRowArray — Append-Only Array for UnsafeRows (with Disk Spill Threshold)

2013-05-31admin阅读(2269)

ExternalAppendOnlyUnsafeRowArray — Append-Only Array for UnsafeRows (with Disk Spill Threshold)

ExternalAppendOnlyUnsafeRowArray is an append-only array for UnsafeRows that spills content to disk when a predefined spill threshold of rows is reached.

Note	Choosing a proper spill threshold of rows is a performance optimization.

ExternalAppendOnlyUnsafeRowArray is created when:

WindowExec physical operator is executed (and creates an internal buffer for window frames)
WindowFunctionFrame is prepared
SortMergeJoinExec physical operator is executed (and creates a RowIterator for INNER and CROSS joins) and for getBufferedMatches
SortMergeJoinScanner creates an internal bufferedMatches
UnsafeCartesianRDD is computed

Table 1. ExternalAppendOnlyUnsafeRowArray’s Internal Registries and Counters
Name	Description
`initialSizeOfInMemoryBuffer`	FIXME Used when…FIXME
`inMemoryBuffer`	FIXME Can grow up to numRowsSpillThreshold rows (i.e. new `UnsafeRows` are added) Used when…FIXME
`spillableArray`	`UnsafeExternalSorter` Used when…FIXME
`numRows`	Used when…FIXME
`modificationsCount`	Used when…FIXME
`numFieldsPerRow`	Used when…FIXME

Tip

Enable INFO logging level for org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray logger to see what happens inside.

Add the following line to conf/log4j.properties:



log4j.logger.org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray=INFO

1

2

3

4

5

log4j.logger.org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray=INFO

Refer to Logging.

`generateIterator` Method



generateIterator(): Iterator[UnsafeRow]
generateIterator(startIndex: Int): Iterator[UnsafeRow]

1

2

3

4

5

6

generateIterator(): Iterator[UnsafeRow]

generateIterator(startIndex: Int): Iterator[UnsafeRow]

Caution

FIXME

`add` Method



add(unsafeRow: UnsafeRow): Unit

1

2

3

4

5

add(unsafeRow: UnsafeRow): Unit

Caution

FIXME

Note	`add` is used when: `WindowExec` is executed (and fetches all rows in a partition for a group. `SortMergeJoinScanner` buffers matching rows `UnsafeCartesianRDD` is computed

`clear` Method



clear(): Unit

1

2

3

4

5

clear(): Unit

Caution

FIXME

Creating ExternalAppendOnlyUnsafeRowArray Instance

ExternalAppendOnlyUnsafeRowArray takes the following when created:

TaskMemoryManager
BlockManager
SerializerManager
TaskContext
Initial size
Page size (in bytes)
Number of rows to hold before spilling them to disk

ExternalAppendOnlyUnsafeRowArray initializes the internal registries and counters.

CatalystSerde

2013-05-30admin阅读(2654)

CatalystSerde Helper Object

CatalystSerde is a Scala object that consists of three utility methods:

deserialize to create a new logical plan with the input logical plan wrapped inside DeserializeToObject logical operator.
serialize
generateObjAttr

CatalystSerde and belongs to org.apache.spark.sql.catalyst.plans.logical package.

Creating Logical Plan with DeserializeToObject Logical Operator for Logical Plan — `deserialize` Method



deserialize[T : Encoder](child: LogicalPlan): DeserializeToObject

1

2

3

4

5

deserialize[T : Encoder](child: LogicalPlan): DeserializeToObject

deserialize creates a DeserializeToObject logical operator for the input child logical plan.

Internally, deserialize creates a UnresolvedDeserializer for the deserializer for the type T first and passes it on to a DeserializeToObject with a AttributeReference (being the result of generateObjAttr).

`serialize` Method



serialize[T : Encoder](child: LogicalPlan): SerializeFromObject

1

2

3

4

5

serialize[T : Encoder](child: LogicalPlan): SerializeFromObject

`generateObjAttr` Method



generateObjAttr[T : Encoder]: Attribute

1

2

3

4

5

generateObjAttr[T : Encoder]: Attribute

TungstenAggregationIterator — Iterator of UnsafeRows for HashAggregateExec Physical Operator

2013-05-29admin阅读(1732)

TungstenAggregationIterator — Iterator of UnsafeRows for HashAggregateExec Physical Operator

TungstenAggregationIterator is a AggregationIterator that the HashAggregateExec aggregate physical operator uses when executed (to process UnsafeRows per partition and calculate aggregations).

TungstenAggregationIterator prefers hash-based aggregation (before switching to sort-based aggregation).



val q = spark.range(10).
  groupBy('id % 2 as "group").
  agg(sum("id") as "sum")
val execPlan = q.queryExecution.sparkPlan
scala> println(execPlan.numberedTreeString)
00 HashAggregate(keys=[(id#0L % 2)#11L], functions=[sum(id#0L)], output=[group#3L, sum#7L])
01 +- HashAggregate(keys=[(id#0L % 2) AS (id#0L % 2)#11L], functions=[partial_sum(id#0L)], output=[(id#0L % 2)#11L, sum#13L])
02    +- Range (0, 10, step=1, splits=8)

import org.apache.spark.sql.execution.aggregate.HashAggregateExec
val hashAggExec = execPlan.asInstanceOf[HashAggregateExec]
val hashAggExecRDD = hashAggExec.execute

// MapPartitionsRDD is in private[spark] scope
// Use :paste -raw for the following helper object
package org.apache.spark
object AccessPrivateSpark {
  import org.apache.spark.rdd.RDD
  def mapPartitionsRDD[T](hashAggExecRDD: RDD[T]) = {
    import org.apache.spark.rdd.MapPartitionsRDD
    hashAggExecRDD.asInstanceOf[MapPartitionsRDD[_, _]]
  }
}
// END :paste -raw

import org.apache.spark.AccessPrivateSpark
val mpRDD = AccessPrivateSpark.mapPartitionsRDD(hashAggExecRDD)
val f = mpRDD.iterator(_, _)

import org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator
// FIXME How to show that TungstenAggregationIterator is used?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

val q = spark.range(10).

groupBy('id % 2 as "group").

agg(sum("id") as "sum")

val execPlan = q.queryExecution.sparkPlan

scala> println(execPlan.numberedTreeString)

00 HashAggregate(keys=[(id#0L % 2)#11L], functions=[sum(id#0L)], output=[group#3L, sum#7L])

01 +- HashAggregate(keys=[(id#0L % 2) AS (id#0L % 2)#11L], functions=[partial_sum(id#0L)], output=[(id#0L % 2)#11L, sum#13L])

02 +- Range (0, 10, step=1, splits=8)

import org.apache.spark.sql.execution.aggregate.HashAggregateExec

val hashAggExec = execPlan.asInstanceOf[HashAggregateExec]

val hashAggExecRDD = hashAggExec.execute

// MapPartitionsRDD is in private[spark] scope

// Use :paste -raw for the following helper object

package org.apache.spark

object AccessPrivateSpark {

import org.apache.spark.rdd.RDD

def mapPartitionsRDD[T](hashAggExecRDD: RDD[T]) = {

import org.apache.spark.rdd.MapPartitionsRDD

hashAggExecRDD.asInstanceOf[MapPartitionsRDD[_, _]]

}

// END :paste -raw

import org.apache.spark.AccessPrivateSpark

val mpRDD = AccessPrivateSpark.mapPartitionsRDD(hashAggExecRDD)

val f = mpRDD.iterator(_, _)

import org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator

// FIXME How to show that TungstenAggregationIterator is used?

When created, TungstenAggregationIterator gets SQL metrics from the HashAggregateExec aggregate physical operator being executed, i.e. numOutputRows, peakMemory, spillSize and avgHashProbe metrics.

numOutputRows is used when TungstenAggregationIterator is requested for the next UnsafeRow (and it has one)
peakMemory, spillSize and avgHashProbe are used at the end of every task (one per partition)

The metrics are then displayed as part of HashAggregateExec aggregate physical operator (e.g. in web UI in Details for Query).

spark sql HashAggregateExec webui details for query.png

Figure 1. HashAggregateExec in web UI (Details for Query)

Table 1. TungstenAggregationIterator’s Internal Properties (e.g. Registries, Counters and Flags)
Name	Description
`aggregationBufferMapIterator`	`KVIterator[UnsafeRow, UnsafeRow]` Used when…FIXME
`hashMap`	UnsafeFixedWidthAggregationMap with the following: initialAggregationBuffer StructType built from (the aggBufferAttributes of) the aggregate function expressions StructType built from (the attributes of) the groupingExpressions `1024 * 16` initial capacity The page size of the `TaskMemoryManager` (defaults to `spark.buffer.pageSize` configuration) Used when `TungstenAggregationIterator` is requested for the next UnsafeRow, to outputForEmptyGroupingKeyWithoutInput, processInputs, to initialize the aggregationBufferMapIterator and every time a partition has been processed.
`initialAggregationBuffer`	UnsafeRow that is the aggregation buffer containing initial buffer values. Used when…FIXME
`externalSorter`	`UnsafeKVExternalSorter` used for sort-based aggregation
`sortBased`	Flag to indicate whether `TungstenAggregationIterator` uses sort-based aggregation (not hash-based aggregation). `sortBased` flag is disabled (`false`) by default. Enabled (`true`) when `TungstenAggregationIterator` is requested to switch to sort-based aggregation. Used when…FIXME

`processInputs` Internal Method



processInputs(fallbackStartsAt: (Int, Int)): Unit

1

2

3

4

5

processInputs(fallbackStartsAt: (Int, Int)): Unit

processInputs…FIXME

Note	`processInputs` is used exclusively when `TungstenAggregationIterator` is created (and sets the internal flags to indicate whether to use a hash-based aggregation or, in the worst case, a sort-based aggregation when there is not enough memory for groups and their buffers).

Switching to Sort-Based Aggregation (From Preferred Hash-Based Aggregation) — `switchToSortBasedAggregation` Internal Method



switchToSortBasedAggregation(): Unit

1

2

3

4

5

switchToSortBasedAggregation(): Unit

switchToSortBasedAggregation…FIXME

Note	`switchToSortBasedAggregation` is used exclusively when `TungstenAggregationIterator` is requested to processInputs (and the externalSorter is used).

Getting Next UnsafeRow — `next` Method



next(): UnsafeRow

1

2

3

4

5

next(): UnsafeRow

Note	`next` is part of Scala’s scala.collection.Iterator interface that returns the next element and discards it from the iterator.

next…FIXME

`hasNext` Method



hasNext: Boolean

1

2

3

4

5

hasNext: Boolean

Note	`hasNext` is part of Scala’s scala.collection.Iterator interface that tests whether this iterator can provide another element.

hasNext…FIXME

Creating TungstenAggregationIterator Instance

TungstenAggregationIterator takes the following when created:

Partition index
Grouping named expressions
Aggregate expressions
Aggregate attributes
Initial input buffer offset
Output named expressions
Function to create a new MutableProjection given Catalyst expressions and attributes (i.e. (Seq[Expression], Seq[Attribute]) ⇒ MutableProjection)
Output attributes (of the child of the HashAggregateExec physical operator)
Iterator of InternalRows (from a single partition of the child of the HashAggregateExec physical operator)
(used for testing) Optional HashAggregateExec‘s testFallbackStartsAt
numOutputRows SQLMetric
peakMemory SQLMetric
spillSize SQLMetric
avgHashProbe SQLMetric

Note	The SQL metrics (numOutputRows, peakMemory, spillSize and avgHashProbe) belong to the HashAggregateExec physical operator that created the `TungstenAggregationIterator`.

TungstenAggregationIterator initializes the internal registries and counters.

TungstenAggregationIterator starts processing input rows and pre-loads the first key-value pair from the UnsafeFixedWidthAggregationMap if did not switch to sort-based aggregation.

`generateResultProjection` Method



generateResultProjection(): (UnsafeRow, InternalRow) => UnsafeRow

1

2

3

4

5

generateResultProjection(): (UnsafeRow, InternalRow) => UnsafeRow

Note	`generateResultProjection` is part of the AggregationIterator Contract to…FIXME.

generateResultProjection…FIXME

Creating UnsafeRow — `outputForEmptyGroupingKeyWithoutInput` Method



outputForEmptyGroupingKeyWithoutInput(): UnsafeRow

1

2

3

4

5

outputForEmptyGroupingKeyWithoutInput(): UnsafeRow

outputForEmptyGroupingKeyWithoutInput…FIXME

Note	`outputForEmptyGroupingKeyWithoutInput` is used when…FIXME

TaskCompletionListener

TungstenAggregationIterator registers a TaskCompletionListener that is executed on task completion (for every task that processes a partition).

When executed (once per partition), the TaskCompletionListener updates the following metrics:

peakMemory
spillSize
avgHashProbe

SortBasedAggregationIterator

2013-05-28admin阅读(1843)

SortBasedAggregationIterator

SortBasedAggregationIterator is…FIXME

`next` Method



next(): UnsafeRow

1

2

3

4

5

next(): UnsafeRow

Note	`next` is part of Scala’s scala.collection.Iterator interface that returns the next element and discards it from the iterator.

next…FIXME

`outputForEmptyGroupingKeyWithoutInput` Method



outputForEmptyGroupingKeyWithoutInput(): UnsafeRow

1

2

3

4

5

outputForEmptyGroupingKeyWithoutInput(): UnsafeRow

outputForEmptyGroupingKeyWithoutInput…FIXME

Note	`outputForEmptyGroupingKeyWithoutInput` is used when…FIXME

`newBuffer` Internal Method



newBuffer: InternalRow

1

2

3

4

5

newBuffer: InternalRow

newBuffer…FIXME

Note	`newBuffer` is used when…FIXME

ObjectAggregationIterator

2013-05-27admin阅读(1873)

ObjectAggregationIterator

ObjectAggregationIterator is…FIXME

`next` Method



next(): UnsafeRow

1

2

3

4

5

next(): UnsafeRow

Note	`next` is part of Scala’s scala.collection.Iterator interface that returns the next element and discards it from the iterator.

next…FIXME

`outputForEmptyGroupingKeyWithoutInput` Method



outputForEmptyGroupingKeyWithoutInput(): UnsafeRow

1

2

3

4

5

outputForEmptyGroupingKeyWithoutInput(): UnsafeRow

outputForEmptyGroupingKeyWithoutInput…FIXME

Note	`outputForEmptyGroupingKeyWithoutInput` is used when…FIXME

AggregationIterator — Generic Iterator of UnsafeRows for Aggregate Physical Operators

2013-05-26admin阅读(1732)

AggregationIterator — Generic Iterator of UnsafeRows for Aggregate Physical Operators

AggregationIterator is the base for iterators of UnsafeRows that…FIXME

Iterators are data structures that allow to iterate over a sequence of elements. They have a hasNext method for checking if there is a next element available, and a next method which returns the next element and discards it from the iterator.

Name Description

ObjectAggregationIterator

Used exclusively when ObjectHashAggregateExec physical operator is executed.

SortBasedAggregationIterator

Used exclusively when SortAggregateExec physical operator is executed.

TungstenAggregationIterator

Used exclusively when HashAggregateExec physical operator is executed.

Note	HashAggregateExec operator is the preferred aggregate physical operator for Aggregation execution planning strategy (over `ObjectHashAggregateExec` and `SortAggregateExec`).

Table 2. AggregationIterator’s Internal Registries and Counters
Name	Description
`aggregateFunctions`	Aggregate functions Used when…FIXME
`allImperativeAggregateFunctions`	ImperativeAggregate functions Used when…FIXME
`allImperativeAggregateFunctionPositions`	Positions Used when…FIXME
`expressionAggInitialProjection`	`MutableProjection` Used when…FIXME
`generateOutput`	Function used to generate an unsafe row (i.e. `(UnsafeRow, InternalRow) ⇒ UnsafeRow`) Used when: `ObjectAggregationIterator` is requested for the next unsafe row and outputForEmptyGroupingKeyWithoutInput `SortBasedAggregationIterator` is requested for the next unsafe row and outputForEmptyGroupingKeyWithoutInput `TungstenAggregationIterator` is requested for the next unsafe row and outputForEmptyGroupingKeyWithoutInput
`groupingAttributes`	Grouping attributes Used when…FIXME
`groupingProjection`	UnsafeProjection Used when…FIXME
`processRow`	`(InternalRow, InternalRow) ⇒ Unit` Used when…FIXME

Creating AggregationIterator Instance

AggregationIterator takes the following when created:

Grouping named expressions
Input attributes
Aggregate expressions
Aggregate attributes
Initial input buffer offset
Result named expressions
Function to create a new MutableProjection given expressions and attributes

AggregationIterator initializes the internal registries and counters.

Note	`AggregationIterator` is a Scala abstract class and cannot be created directly. It is created indirectly for the concrete AggregationIterators.

`initializeAggregateFunctions` Internal Method



initializeAggregateFunctions(
  expressions: Seq[AggregateExpression],
  startingInputBufferOffset: Int): Array[AggregateFunction]

1

2

3

4

5

6

7

initializeAggregateFunctions(

expressions: Seq[AggregateExpression],

startingInputBufferOffset: Int): Array[AggregateFunction]

initializeAggregateFunctions…FIXME

Note	`initializeAggregateFunctions` is used when…FIXME

`generateProcessRow` Internal Method



generateProcessRow(
  expressions: Seq[AggregateExpression],
  functions: Seq[AggregateFunction],
  inputAttributes: Seq[Attribute]): (InternalRow, InternalRow) => Unit

1

2

3

4

5

6

7

8

generateProcessRow(

expressions: Seq[AggregateExpression],

functions: Seq[AggregateFunction],

inputAttributes: Seq[Attribute]): (InternalRow, InternalRow) => Unit

generateProcessRow…FIXME

Note	`generateProcessRow` is used when…FIXME

`generateResultProjection` Method



generateResultProjection(): (UnsafeRow, InternalRow) => UnsafeRow

1

2

3

4

5

generateResultProjection(): (UnsafeRow, InternalRow) => UnsafeRow

generateResultProjection…FIXME

Note	`generateResultProjection` is used when: `AggregationIterator` is created `TungstenAggregationIterator` is requested for the generateResultProjection

UnsafeRow — Mutable Raw-Memory Unsafe Binary Row Format

2013-05-25admin阅读(1926)

UnsafeRow — Mutable Raw-Memory Unsafe Binary Row Format

UnsafeRow is a concrete InternalRow that represents a mutable internal raw-memory (and hence unsafe) binary row format.

In other words, UnsafeRow is an InternalRow that is backed by raw memory instead of Java objects.



// Use ExpressionEncoder for simplicity
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
val stringEncoder = ExpressionEncoder[String]
val row = stringEncoder.toRow("hello world")

import org.apache.spark.sql.catalyst.expressions.UnsafeRow
val unsafeRow = row match { case ur: UnsafeRow => ur }

scala> unsafeRow.getBytes
res0: Array[Byte] = Array(0, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 16, 0, 0, 0, 104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 0, 0, 0, 0, 0)

scala> unsafeRow.getUTF8String(0)
res1: org.apache.spark.unsafe.types.UTF8String = hello world

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

// Use ExpressionEncoder for simplicity

import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder

val stringEncoder = ExpressionEncoder[String]

val row = stringEncoder.toRow("hello world")

import org.apache.spark.sql.catalyst.expressions.UnsafeRow

val unsafeRow = row match { case ur: UnsafeRow => ur }

scala> unsafeRow.getBytes

res0: Array[Byte] = Array(0, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 16, 0, 0, 0, 104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 0, 0, 0, 0, 0)

scala> unsafeRow.getUTF8String(0)

res1: org.apache.spark.unsafe.types.UTF8String = hello world

UnsafeRow knows its size in bytes.



scala> println(unsafeRow.getSizeInBytes)
32

1

2

3

4

5

6

scala> println(unsafeRow.getSizeInBytes)

32

UnsafeRow supports Java’s Externalizable and Kryo’s KryoSerializable serialization/deserialization protocols.

The fields of a data row are placed using field offsets.

UnsafeRow considers a data type mutable if it is one of the following:

UnsafeRow is composed of three regions:

Null Bit Set Bitmap Region (1 bit/field) for tracking null values
Fixed-Length 8-Byte Values Region
Variable-Length Data Section

That gives the property of rows being always 8-byte word aligned and so their size is always a multiple of 8 bytes.

Equality comparision and hashing of rows can be performed on raw bytes since if two rows are identical so should be their bit-wise representation. No type-specific interpretation is required.

`isMutable` Static Predicate



static boolean isMutable(DataType dt)

1

2

3

4

5

static boolean isMutable(DataType dt)

isMutable is enabled (true) when the input DataType is among the mutable field types or a DecimalType.

Otherwise, isMutable is disabled (false).

Note	`isMutable` is used when: `UnsafeFixedWidthAggregationMap` is requested to supportsAggregationBufferSchema `SortBasedAggregationIterator` is requested for newBuffer

Kryo’s KryoSerializable SerDe Protocol

Tip	Read up on KryoSerializable.

Serializing JVM Object — KryoSerializable’s `write` Method



void write(Kryo kryo, Output out)

1

2

3

4

5

void write(Kryo kryo, Output out)

Deserializing Kryo-Managed Object — KryoSerializable’s `read` Method



void read(Kryo kryo, Input in)

1

2

3

4

5

void read(Kryo kryo, Input in)

Java’s Externalizable SerDe Protocol

Tip	Read up on java.io.Externalizable.

Serializing JVM Object — Externalizable’s `writeExternal` Method



void writeExternal(ObjectOutput out)
throws IOException

1

2

3

4

5

6

void writeExternal(ObjectOutput out)

throws IOException

Deserializing Java-Externalized Object — Externalizable’s `readExternal` Method



void readExternal(ObjectInput in)
throws IOException, ClassNotFoundException

1

2

3

4

5

6

void readExternal(ObjectInput in)

throws IOException, ClassNotFoundException

`pointTo` Method



void pointTo(Object baseObject, long baseOffset, int sizeInBytes)

1

2

3

4

5

void pointTo(Object baseObject, long baseOffset, int sizeInBytes)

pointTo…FIXME

Note	`pointTo` is used when…FIXME

InternalRow — Abstract Binary Row Format

2013-05-24admin阅读(2177)

InternalRow — Abstract Binary Row Format

Note	`InternalRow` is also called Catalyst row or Spark SQL row.

Note	UnsafeRow is a concrete `InternalRow`.



// The type of your business objects
case class Person(id: Long, name: String)

// The encoder for Person objects
import org.apache.spark.sql.Encoders
val personEncoder = Encoders.product[Person]

// The expression encoder for Person objects
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
val personExprEncoder = personEncoder.asInstanceOf[ExpressionEncoder[Person]]

// Convert Person objects to InternalRow
scala> val row = personExprEncoder.toRow(Person(0, "Jacek"))
row: org.apache.spark.sql.catalyst.InternalRow = [0,0,1800000005,6b6563614a]

// How many fields are available in Person's InternalRow?
scala> row.numFields
res0: Int = 2

// Are there any NULLs in this InternalRow?
scala> row.anyNull
res1: Boolean = false

// You can create your own InternalRow objects
import org.apache.spark.sql.catalyst.InternalRow

scala> val ir = InternalRow(5, "hello", (0, "nice"))
ir: org.apache.spark.sql.catalyst.InternalRow = [5,hello,(0,nice)]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

// The type of your business objects

case class Person(id: Long, name: String)

// The encoder for Person objects

import org.apache.spark.sql.Encoders

val personEncoder = Encoders.product[Person]

// The expression encoder for Person objects

import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder

val personExprEncoder = personEncoder.asInstanceOf[ExpressionEncoder[Person]]

// Convert Person objects to InternalRow

scala> val row = personExprEncoder.toRow(Person(0, "Jacek"))

row: org.apache.spark.sql.catalyst.InternalRow = [0,0,1800000005,6b6563614a]

// How many fields are available in Person's InternalRow?

scala> row.numFields

res0: Int = 2

// Are there any NULLs in this InternalRow?

scala> row.anyNull

res1: Boolean = false

// You can create your own InternalRow objects

import org.apache.spark.sql.catalyst.InternalRow

scala> val ir = InternalRow(5, "hello", (0, "nice"))

ir: org.apache.spark.sql.catalyst.InternalRow = [5,hello,(0,nice)]

There are methods to create InternalRow objects using the factory methods in the InternalRow object.



import org.apache.spark.sql.catalyst.InternalRow

scala> InternalRow.empty
res0: org.apache.spark.sql.catalyst.InternalRow = [empty row]

scala> InternalRow(0, "string", (0, "pair"))
res1: org.apache.spark.sql.catalyst.InternalRow = [0,string,(0,pair)]

scala> InternalRow.fromSeq(Seq(0, "string", (0, "pair")))
res2: org.apache.spark.sql.catalyst.InternalRow = [0,string,(0,pair)]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

import org.apache.spark.sql.catalyst.InternalRow

scala> InternalRow.empty

res0: org.apache.spark.sql.catalyst.InternalRow = [empty row]

scala> InternalRow(0, "string", (0, "pair"))

res1: org.apache.spark.sql.catalyst.InternalRow = [0,string,(0,pair)]

scala> InternalRow.fromSeq(Seq(0, "string", (0, "pair")))

res2: org.apache.spark.sql.catalyst.InternalRow = [0,string,(0,pair)]

`getString` Method

Caution

FIXME

Tungsten Execution Backend (Project Tungsten)

2013-05-23admin阅读(1774)

Tungsten Execution Backend (Project Tungsten)

The goal of Project Tungsten is to improve Spark execution by optimizing Spark jobs for CPU and memory efficiency (as opposed to network and disk I/O which are considered fast enough). Tungsten focuses on the hardware architecture of the platform Spark runs on, including but not limited to JVM, LLVM, GPU, NVRAM, etc. It does so by offering the following optimization features:

Off-Heap Memory Management using binary in-memory data representation aka Tungsten row format and managing memory explicitly,
Cache Locality which is about cache-aware computations with cache-aware layout for high cache hit rates,
Whole-Stage Code Generation (aka CodeGen).

Important

Project Tungsten uses sun.misc.unsafe API for direct memory access to bypass the JVM in order to avoid garbage collection.



// million integers
val intsMM = 1 to math.pow(10, 6).toInt

// that gives ca 3.8 MB in memory
scala> sc.parallelize(intsMM).cache.count
res0: Long = 1000000

// that gives ca 998.4 KB in memory
scala> intsMM.toDF.cache.count
res1: Long = 1000000

1

2

3

4

5

6

7

8

9

10

11

12

13

14

// million integers

val intsMM = 1 to math.pow(10, 6).toInt

// that gives ca 3.8 MB in memory

scala> sc.parallelize(intsMM).cache.count

res0: Long = 1000000

// that gives ca 998.4 KB in memory

scala> intsMM.toDF.cache.count

res1: Long = 1000000

Figure 1. RDD vs DataFrame Size in Memory in web UI — Thank you, Tungsten!

Off-Heap Memory Management

Project Tungsten aims at substantially reducing the usage of JVM objects (and therefore JVM garbage collection) by introducing its own off-heap binary memory management. Instead of working with Java objects, Tungsten uses sun.misc.Unsafe to manipulate raw memory.

Tungsten uses the compact storage format called UnsafeRow for data representation that further reduces memory footprint.

Since Datasets have known schema, Tungsten properly and in a more compact and efficient way lays out the objects on its own. That brings benefits similar to using extensions written in low-level and hardware-aware languages like C or assembler.

It is possible immediately with the data being already serialized (that further reduces or completely avoids serialization between JVM object representation and Spark’s internal one).

Cache Locality

Tungsten uses algorithms and cache-aware data structures that exploit the physical machine caches at different levels – L1, L2, L3.

Whole-Stage Java Code Generation

Tungsten does code generation at compile time and generates JVM bytecode to access Tungsten-managed memory structures that gives a very fast access. It uses the Janino compiler — a super-small, super-fast Java compiler.

Note	The code generation was tracked under SPARK-8159 Improve expression function coverage (Spark 1.5).

Tip	Read Whole-Stage Code Generation.

spark-sql 第7页

UnsafeFixedWidthAggregationMap

supportsAggregationBufferSchema Static Method

Creating UnsafeFixedWidthAggregationMap Instance

getAggregationBufferFromUnsafeRow Method

getAggregationBuffer Method

Getting KVIterator — iterator Method

getPeakMemoryUsedBytes Method

getAverageProbesPerLookup Method

free Method

destructAndCreateExternalSorter Method

ExternalAppendOnlyUnsafeRowArray — Append-Only Array for UnsafeRows (with Disk Spill Threshold)

generateIterator Method

add Method

clear Method

Creating ExternalAppendOnlyUnsafeRowArray Instance

CatalystSerde Helper Object

Creating Logical Plan with DeserializeToObject Logical Operator for Logical Plan — deserialize Method

serialize Method

generateObjAttr Method

TungstenAggregationIterator — Iterator of UnsafeRows for HashAggregateExec Physical Operator

processInputs Internal Method

Switching to Sort-Based Aggregation (From Preferred Hash-Based Aggregation) — switchToSortBasedAggregation Internal Method

Getting Next UnsafeRow — next Method

hasNext Method

Creating TungstenAggregationIterator Instance

generateResultProjection Method

Creating UnsafeRow — outputForEmptyGroupingKeyWithoutInput Method

TaskCompletionListener

SortBasedAggregationIterator

next Method

outputForEmptyGroupingKeyWithoutInput Method

newBuffer Internal Method

ObjectAggregationIterator

next Method

outputForEmptyGroupingKeyWithoutInput Method

AggregationIterator — Generic Iterator of UnsafeRows for Aggregate Physical Operators

Creating AggregationIterator Instance

initializeAggregateFunctions Internal Method

generateProcessRow Internal Method

generateResultProjection Method

UnsafeRow — Mutable Raw-Memory Unsafe Binary Row Format

isMutable Static Predicate

Kryo’s KryoSerializable SerDe Protocol

Serializing JVM Object — KryoSerializable’s write Method

Deserializing Kryo-Managed Object — KryoSerializable’s read Method

Java’s Externalizable SerDe Protocol

Serializing JVM Object — Externalizable’s writeExternal Method

Deserializing Java-Externalized Object — Externalizable’s readExternal Method

pointTo Method

InternalRow — Abstract Binary Row Format

getString Method

Tungsten Execution Backend (Project Tungsten)

Off-Heap Memory Management

Cache Locality

Whole-Stage Java Code Generation

Further Reading and Watching

欢迎关注：spark技术分享

关注公众号：spark技术分享

QQ咨询

回顶部

`supportsAggregationBufferSchema` Static Method

`getAggregationBufferFromUnsafeRow` Method

`getAggregationBuffer` Method

Getting KVIterator — `iterator` Method

`getPeakMemoryUsedBytes` Method

`getAverageProbesPerLookup` Method

`free` Method

`destructAndCreateExternalSorter` Method

`generateIterator` Method

`add` Method

`clear` Method

Creating Logical Plan with DeserializeToObject Logical Operator for Logical Plan — `deserialize` Method

`serialize` Method

`generateObjAttr` Method

`processInputs` Internal Method

Switching to Sort-Based Aggregation (From Preferred Hash-Based Aggregation) — `switchToSortBasedAggregation` Internal Method

Getting Next UnsafeRow — `next` Method

`hasNext` Method

`generateResultProjection` Method

Creating UnsafeRow — `outputForEmptyGroupingKeyWithoutInput` Method

`next` Method

`outputForEmptyGroupingKeyWithoutInput` Method

`newBuffer` Internal Method

`next` Method

`outputForEmptyGroupingKeyWithoutInput` Method

`initializeAggregateFunctions` Internal Method

`generateProcessRow` Internal Method

`generateResultProjection` Method

`isMutable` Static Predicate

Serializing JVM Object — KryoSerializable’s `write` Method

Deserializing Kryo-Managed Object — KryoSerializable’s `read` Method

Serializing JVM Object — Externalizable’s `writeExternal` Method

Deserializing Java-Externalized Object — Externalizable’s `readExternal` Method

`pointTo` Method

`getString` Method