spark-sql-spark技术分享-第20页

ProjectExec Unary Physical Operator

ProjectExec is a unary physical operator (i.e. with one child physical operator) that…FIXME

ProjectExec supports Java code generation (aka codegen).

ProjectExec is created when:

InMemoryScans and HiveTableScans execution planning strategies are executed (and request SparkPlanner to pruneFilterProject)
BasicOperators execution planning strategy is requested to resolve a Project logical operator
DataSourceStrategy execution planning strategy is requested to creates a RowDataSourceScanExec
FileSourceStrategy execution planning strategy is requested to plan a LogicalRelation with a HadoopFsRelation
ExtractPythonUDFs physical optimization is requested to optimize a physical query plan (and extracts Python UDFs)

Note

The following is the order of applying the above execution planning strategies to logical query plans when SparkPlanner or Hive-specific SparkPlanner are requested to plan a logical query plan into one or more physical query plans:

HiveTableScans
FileSourceStrategy
DataSourceStrategy
InMemoryScans
BasicOperators

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute requests the input child physical plan to produce an RDD of internal rows and applies a calculation over indexed partitions (using RDD.mapPartitionsWithIndexInternal).


RDD.mapPartitionsWithIndexInternal

mapPartitionsWithIndexInternal[U](
  f: (Int, Iterator[T]) => Iterator[U],
  preservesPartitioning: Boolean = false)

RDD.mapPartitionsWithIndexInternal

mapPartitionsWithIndexInternal[U](

f: (Int, Iterator[T]) => Iterator[U],

preservesPartitioning: Boolean = false)

Inside `doExecute` (`RDD.mapPartitionsWithIndexInternal`)

Inside the function (that is part of RDD.mapPartitionsWithIndexInternal), doExecute creates an UnsafeProjection with the following:

Named expressions
Output of the child physical operator as the input schema
subexpressionEliminationEnabled flag

doExecute requests the UnsafeProjection to initialize and maps over the internal rows (of a partition) using the projection.

Creating ProjectExec Instance

ProjectExec takes the following when created:

NamedExpressions for the projection
Child physical operator

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — `doConsume` Method



doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

Note	`doConsume` is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.

doConsume…FIXME

ObjectHashAggregateExec Aggregate Physical Operator

ObjectHashAggregateExec is a unary physical operator (i.e. with one child physical operator) that is created (indirectly through AggUtils.createAggregate) when:

…FIXME



// ObjectHashAggregateExec selected due to:
// 1. spark.sql.execution.useObjectHashAggregateExec internal flag is enabled
scala> val objectHashEnabled = spark.conf.get("spark.sql.execution.useObjectHashAggregateExec")
objectHashEnabled: String = true

// 2. The following data types are used in aggregateBufferAttributes
// BinaryType
// StringType
// ArrayType
// MapType
// ObjectType
// StructType
val dataset = Seq(
  (0, Seq.empty[Int]),
  (1, Seq(1, 1)),
  (2, Seq(2, 2))).toDF("id", "nums")
import org.apache.spark.sql.functions.size
val q = dataset.
  groupBy(size($"nums") as "group"). // <-- size over array
  agg(collect_list("id") as "ids")
scala> q.explain
== Physical Plan ==
ObjectHashAggregate(keys=[size(nums#113)#127], functions=[collect_list(id#112, 0, 0)])
+- Exchange hashpartitioning(size(nums#113)#127, 200)
   +- ObjectHashAggregate(keys=[size(nums#113) AS size(nums#113)#127], functions=[partial_collect_list(id#112, 0, 0)])
      +- LocalTableScan [id#112, nums#113]

scala> println(q.queryExecution.sparkPlan.numberedTreeString)
00 ObjectHashAggregate(keys=[size(nums#113)#130], functions=[collect_list(id#112, 0, 0)], output=[group#117, ids#122])
01 +- ObjectHashAggregate(keys=[size(nums#113) AS size(nums#113)#130], functions=[partial_collect_list(id#112, 0, 0)], output=[size(nums#113)#130, buf#132])
02    +- LocalTableScan [id#112, nums#113]

// Going low level...watch your steps :)

// copied from HashAggregateExec as it is the preferred aggreate physical operator
// and HashAggregateExec is checked first
// When the check fails, ObjectHashAggregateExec is then checked
import q.queryExecution.optimizedPlan
import org.apache.spark.sql.catalyst.plans.logical.Aggregate
val aggLog = optimizedPlan.asInstanceOf[Aggregate]
import org.apache.spark.sql.catalyst.planning.PhysicalAggregation
import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
val aggregateExpressions: Seq[AggregateExpression] = PhysicalAggregation.unapply(aggLog).get._2
val aggregateBufferAttributes = aggregateExpressions.
 flatMap(_.aggregateFunction.aggBufferAttributes)
import org.apache.spark.sql.execution.aggregate.HashAggregateExec
// that's one of the reasons why ObjectHashAggregateExec was selected
// HashAggregateExec did not meet the requirements
scala> val useHash = HashAggregateExec.supportsAggregate(aggregateBufferAttributes)
useHash: Boolean = true

// collect_list aggregate function uses CollectList TypedImperativeAggregate under the covers
import org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec
scala> val useObjectHash = ObjectHashAggregateExec.supportsAggregate(aggregateExpressions)
useObjectHash: Boolean = true

val aggExec = q.queryExecution.sparkPlan.children.head.asInstanceOf[ObjectHashAggregateExec]
scala> println(aggExec.aggregateExpressions.head.numberedTreeString)
00 partial_collect_list(id#112, 0, 0)
01 +- collect_list(id#112, 0, 0)
02    +- id#112: int

// ObjectHashAggregateExec selected due to:

// 1. spark.sql.execution.useObjectHashAggregateExec internal flag is enabled

scala> val objectHashEnabled = spark.conf.get("spark.sql.execution.useObjectHashAggregateExec")

objectHashEnabled: String = true

// 2. The following data types are used in aggregateBufferAttributes

// BinaryType

// StringType

// ArrayType

// MapType

// ObjectType

// StructType

val dataset = Seq(

(0, Seq.empty[Int]),

(1, Seq(1, 1)),

(2, Seq(2, 2))).toDF("id", "nums")

import org.apache.spark.sql.functions.size

val q = dataset.

groupBy(size($"nums") as "group"). // <-- size over array

agg(collect_list("id") as "ids")

scala> q.explain

== Physical Plan ==

ObjectHashAggregate(keys=[size(nums#113)#127], functions=[collect_list(id#112, 0, 0)])

+- Exchange hashpartitioning(size(nums#113)#127, 200)

+- ObjectHashAggregate(keys=[size(nums#113) AS size(nums#113)#127], functions=[partial_collect_list(id#112, 0, 0)])

+- LocalTableScan [id#112, nums#113]

scala> println(q.queryExecution.sparkPlan.numberedTreeString)

00 ObjectHashAggregate(keys=[size(nums#113)#130], functions=[collect_list(id#112, 0, 0)], output=[group#117, ids#122])

01 +- ObjectHashAggregate(keys=[size(nums#113) AS size(nums#113)#130], functions=[partial_collect_list(id#112, 0, 0)], output=[size(nums#113)#130, buf#132])

02 +- LocalTableScan [id#112, nums#113]

// Going low level...watch your steps :)

// copied from HashAggregateExec as it is the preferred aggreate physical operator

// and HashAggregateExec is checked first

// When the check fails, ObjectHashAggregateExec is then checked

import q.queryExecution.optimizedPlan

import org.apache.spark.sql.catalyst.plans.logical.Aggregate

val aggLog = optimizedPlan.asInstanceOf[Aggregate]

import org.apache.spark.sql.catalyst.planning.PhysicalAggregation

import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression

val aggregateExpressions: Seq[AggregateExpression] = PhysicalAggregation.unapply(aggLog).get._2

val aggregateBufferAttributes = aggregateExpressions.

flatMap(_.aggregateFunction.aggBufferAttributes)

import org.apache.spark.sql.execution.aggregate.HashAggregateExec

// that's one of the reasons why ObjectHashAggregateExec was selected

// HashAggregateExec did not meet the requirements

scala> val useHash = HashAggregateExec.supportsAggregate(aggregateBufferAttributes)

useHash: Boolean = true

// collect_list aggregate function uses CollectList TypedImperativeAggregate under the covers

import org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec

scala> val useObjectHash = ObjectHashAggregateExec.supportsAggregate(aggregateExpressions)

useObjectHash: Boolean = true

val aggExec = q.queryExecution.sparkPlan.children.head.asInstanceOf[ObjectHashAggregateExec]

scala> println(aggExec.aggregateExpressions.head.numberedTreeString)

00 partial_collect_list(id#112, 0, 0)

01 +- collect_list(id#112, 0, 0)

02 +- id#112: int

Table 1. ObjectHashAggregateExec’s Performance Metrics
Key	Name (in web UI)	Description
`numOutputRows`	number of output rows

spark sql ObjectHashAggregateExec webui details for query.png

Figure 1. ObjectHashAggregateExec in web UI (Details for Query)

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute…FIXME

`supportsAggregate` Method



supportsAggregate(aggregateExpressions: Seq[AggregateExpression]): Boolean

supportsAggregate(aggregateExpressions: Seq[AggregateExpression]): Boolean

supportsAggregate is enabled (i.e. returns true) if there is at least one TypedImperativeAggregate aggregate function in the input aggregateExpressions aggregate expressions.

Note	`supportsAggregate` is used exclusively when `AggUtils` is requested to create an aggregate physical operator given aggregate expressions.

Creating ObjectHashAggregateExec Instance

ObjectHashAggregateExec takes the following when created:

Required child distribution expressions
Grouping named expressions
Aggregate expressions
Aggregate attributes
Initial input buffer offset
Output named expressions
Child physical plan

MapElementsExec

MapElementsExec is…FIXME

LocalTableScanExec Physical Operator

LocalTableScanExec is a leaf physical operator (i.e. no children) and producedAttributes being outputSet.

LocalTableScanExec is created when BasicOperators execution planning strategy resolves LocalRelation and Spark Structured Streaming’s MemoryPlan logical operators.

Tip	Read on `MemoryPlan` logical operator in the Spark Structured Streaming gitbook.



val names = Seq("Jacek", "Agata").toDF("name")
val optimizedPlan = names.queryExecution.optimizedPlan

scala> println(optimizedPlan.numberedTreeString)
00 LocalRelation [name#9]

// Physical plan with LocalTableScanExec operator (shown as LocalTableScan)
scala> names.explain
== Physical Plan ==
LocalTableScan [name#9]

// Going fairly low-level...you've been warned

val plan = names.queryExecution.executedPlan
import org.apache.spark.sql.execution.LocalTableScanExec
val ltse = plan.asInstanceOf[LocalTableScanExec]

val ltseRDD = ltse.execute()
scala> :type ltseRDD
org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow]

scala> println(ltseRDD.toDebugString)
(2) MapPartitionsRDD[1] at execute at <console>:30 []
 |  ParallelCollectionRDD[0] at execute at <console>:30 []

// no computation on the source dataset has really occurred yet
// Let's trigger a RDD action
scala> ltseRDD.first
res6: org.apache.spark.sql.catalyst.InternalRow = [0,1000000005,6b6563614a]

// Low-level "show"
scala> ltseRDD.foreach(println)
[0,1000000005,6b6563614a]
[0,1000000005,6174616741]

// High-level show
scala> names.show
+-----+
| name|
+-----+
|Jacek|
|Agata|
+-----+

val names = Seq("Jacek", "Agata").toDF("name")

val optimizedPlan = names.queryExecution.optimizedPlan

scala> println(optimizedPlan.numberedTreeString)

00 LocalRelation [name#9]

// Physical plan with LocalTableScanExec operator (shown as LocalTableScan)

scala> names.explain

== Physical Plan ==

LocalTableScan [name#9]

// Going fairly low-level...you've been warned

val plan = names.queryExecution.executedPlan

import org.apache.spark.sql.execution.LocalTableScanExec

val ltse = plan.asInstanceOf[LocalTableScanExec]

val ltseRDD = ltse.execute()

scala> :type ltseRDD

org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow]

scala> println(ltseRDD.toDebugString)

(2) MapPartitionsRDD[1] at execute at <console>:30 []

| ParallelCollectionRDD[0] at execute at <console>:30 []

// no computation on the source dataset has really occurred yet

// Let's trigger a RDD action

scala> ltseRDD.first

res6: org.apache.spark.sql.catalyst.InternalRow = [0,1000000005,6b6563614a]

// Low-level "show"

scala> ltseRDD.foreach(println)

[0,1000000005,6b6563614a]

[0,1000000005,6174616741]

// High-level show

scala> names.show

+-----+

| name|

+-----+

|Jacek|

|Agata|

+-----+

Table 1. LocalTableScanExec’s Performance Metrics
Key	Name (in web UI)	Description
`numOutputRows`	number of output rows

Note

It appears that when no Spark job is used to execute a LocalTableScanExec the numOutputRows metric is not displayed in the web UI.



val names = Seq("Jacek", "Agata").toDF("name")

// The following query gives no numOutputRows metric in web UI's Details for Query (SQL tab)
scala> names.show
+-----+
| name|
+-----+
|Jacek|
|Agata|
+-----+

// The query gives numOutputRows metric in web UI's Details for Query (SQL tab)
scala> names.groupBy(length($"name")).count.show
+------------+-----+
|length(name)|count|
+------------+-----+
|           5|    2|
+------------+-----+

// The (type-preserving) query does also give numOutputRows metric in web UI's Details for Query (SQL tab)
scala> names.as[String].map(_.toUpperCase).show
+-----+
|value|
+-----+
|JACEK|
|AGATA|
+-----+

val names = Seq("Jacek", "Agata").toDF("name")

// The following query gives no numOutputRows metric in web UI's Details for Query (SQL tab)

scala> names.show

+-----+

| name|

+-----+

|Jacek|

|Agata|

+-----+

// The query gives numOutputRows metric in web UI's Details for Query (SQL tab)

scala> names.groupBy(length($"name")).count.show

+------------+-----+

|length(name)|count|

+------------+-----+

| 5| 2|

+------------+-----+

// The (type-preserving) query does also give numOutputRows metric in web UI's Details for Query (SQL tab)

scala> names.as[String].map(_.toUpperCase).show

+-----+

|value|

+-----+

|JACEK|

|AGATA|

+-----+

When executed, LocalTableScanExec…FIXME

spark sql LocalTableScanExec webui query details.png

Figure 1. LocalTableScanExec in web UI (Details for Query)

Table 2. LocalTableScanExec’s Internal Properties
Name	Description
`unsafeRows`	Internal binary rows for…FIXME
`numParallelism`
`rdd`

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute…FIXME

Creating LocalTableScanExec Instance

LocalTableScanExec takes the following when created:

Output schema attributes
Internal binary rows

InMemoryTableScanExec Leaf Physical Operator

InMemoryTableScanExec is a leaf physical operator to represent an InMemoryRelation logical operator at execution time.

InMemoryTableScanExec is a ColumnarBatchScan that supports batch decoding (when created for a DataSourceReader that supports it, i.e. the DataSourceReader is a SupportsScanColumnarBatch with the enableBatchRead flag enabled).

InMemoryTableScanExec supports partition batch pruning (only when spark.sql.inMemoryColumnarStorage.partitionPruning internal configuration property is enabled which is so by default).

InMemoryTableScanExec is created exclusively when InMemoryScans execution planning strategy is executed and finds an InMemoryRelation logical operator in a logical query plan.



// Sample DataFrames
val tokens = Seq(
  (0, "playing"),
  (1, "with"),
  (2, "InMemoryTableScanExec")
).toDF("id", "token")
val ids = spark.range(10)

// Cache DataFrames
tokens.cache
ids.cache

val q = tokens.join(ids, Seq("id"), "outer")
scala> q.explain
== Physical Plan ==
*Project [coalesce(cast(id#5 as bigint), id#10L) AS id#33L, token#6]
+- SortMergeJoin [cast(id#5 as bigint)], [id#10L], FullOuter
   :- *Sort [cast(id#5 as bigint) ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(cast(id#5 as bigint), 200)
   :     +- InMemoryTableScan [id#5, token#6]
   :           +- InMemoryRelation [id#5, token#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
   :                 +- LocalTableScan [id#5, token#6]
   +- *Sort [id#10L ASC NULLS FIRST], false, 0
      +- Exchange hashpartitioning(id#10L, 200)
         +- InMemoryTableScan [id#10L]
               +- InMemoryRelation [id#10L], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                     +- *Range (0, 10, step=1, splits=8)

// Sample DataFrames

val tokens = Seq(

(0, "playing"),

(1, "with"),

(2, "InMemoryTableScanExec")

).toDF("id", "token")

val ids = spark.range(10)

// Cache DataFrames

tokens.cache

ids.cache

val q = tokens.join(ids, Seq("id"), "outer")

scala> q.explain

== Physical Plan ==

*Project [coalesce(cast(id#5 as bigint), id#10L) AS id#33L, token#6]

+- SortMergeJoin [cast(id#5 as bigint)], [id#10L], FullOuter

:- *Sort [cast(id#5 as bigint) ASC NULLS FIRST], false, 0

: +- Exchange hashpartitioning(cast(id#5 as bigint), 200)

: +- InMemoryTableScan [id#5, token#6]

: +- InMemoryRelation [id#5, token#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)

: +- LocalTableScan [id#5, token#6]

+- *Sort [id#10L ASC NULLS FIRST], false, 0

+- Exchange hashpartitioning(id#10L, 200)

+- InMemoryTableScan [id#10L]

+- InMemoryRelation [id#10L], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)

+- *Range (0, 10, step=1, splits=8)



val q = spark.range(4).cache
val plan = q.queryExecution.executedPlan
import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
val inmemoryScan = plan.collectFirst { case exec: InMemoryTableScanExec => exec }.get
assert(inmemoryScan.supportCodegen == inmemoryScan.supportsBatch)

val q = spark.range(4).cache

val plan = q.queryExecution.executedPlan

import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec

val inmemoryScan = plan.collectFirst { case exec: InMemoryTableScanExec => exec }.get

assert(inmemoryScan.supportCodegen == inmemoryScan.supportsBatch)

Table 1. InMemoryTableScanExec’s Performance Metrics
Key	Name (in web UI)	Description
`numOutputRows`	number of output rows

spark sql InMemoryTableScanExec webui query details.png

Figure 1. InMemoryTableScanExec in web UI (Details for Query)

InMemoryTableScanExec supports Java code generation only if batch decoding is enabled.

InMemoryTableScanExec gives the single inputRDD as the only RDD of internal rows (when WholeStageCodegenExec physical operator is executed).

InMemoryTableScanExec uses spark.sql.inMemoryTableScanStatistics.enable flag (default: false) to enable accumulators (that seems to be exclusively for testing purposes).

Table 2. InMemoryTableScanExec’s Internal Properties (e.g. Registries, Counters and Flags)
Name	Description
`columnarBatchSchema`	Schema of a columnar batch Used exclusively when `InMemoryTableScanExec` is requested to createAndDecompressColumn.
`stats`	PartitionStatistics of the InMemoryRelation Used when `InMemoryTableScanExec` is requested for partitionFilters, partition batch pruning and statsFor.

Creating InMemoryTableScanExec Instance

InMemoryTableScanExec takes the following when created:

Attribute expressions
Predicate expressions
InMemoryRelation logical operator

InMemoryTableScanExec initializes the internal registries and counters.

`vectorTypes` Method



vectorTypes: Option[Seq[String]]

vectorTypes: Option[Seq[String]]

Note	`vectorTypes` is part of ColumnarBatchScan Contract to…FIXME.

vectorTypes uses spark.sql.columnVector.offheap.enabled internal configuration property to select the name of the concrete column vector, i.e. OnHeapColumnVector or OffHeapColumnVector when the property is off or on, respectively.

vectorTypes gives as many column vectors as the attribute expressions.

`supportsBatch` Property



supportsBatch: Boolean

supportsBatch: Boolean

Note	`supportsBatch` is part of ColumnarBatchScan Contract to control whether the physical operator supports vectorized decoding or not.

supportsBatch is enabled when all of the following holds:

spark.sql.inMemoryColumnarStorage.enableVectorizedReader configuration property is enabled
The output schema of the InMemoryRelation uses primitive data types only, i.e. BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType
The number of nested fields in the output schema of the InMemoryRelation is at most spark.sql.codegen.maxFields internal configuration property

`partitionFilters` Property



partitionFilters: Seq[Expression]

partitionFilters: Seq[Expression]

Note	`partitionFilters` is a Scala lazy value which is computed once when accessed and cached afterwards.

partitionFilters…FIXME

Note	`partitionFilters` is used when…FIXME

Applying Partition Batch Pruning to Cached Column Buffers (Creating MapPartitionsRDD of Filtered CachedBatches) — `filteredCachedBatches` Internal Method



filteredCachedBatches(): RDD[CachedBatch]

filteredCachedBatches(): RDD[CachedBatch]

filteredCachedBatches requests PartitionStatistics for the output schema and InMemoryRelation for cached column buffers (as a RDD[CachedBatch]).

filteredCachedBatches takes the cached column buffers (as a RDD[CachedBatch]) and transforms the RDD per partition with index (i.e. RDD.mapPartitionsWithIndexInternal) as follows:

Creates a partition filter as a new GenPredicate for the partitionFilters expressions (concatenated together using And binary operator and the schema)
Requests the generated partition filter Predicate to initialize
Uses spark.sql.inMemoryColumnarStorage.partitionPruning internal configuration property to enable partition batch pruning and filtering out (skipping) CachedBatches in a partition based on column stats and the generated partition filter Predicate

Note	If spark.sql.inMemoryColumnarStorage.partitionPruning internal configuration property is disabled (i.e. `false`), `filteredCachedBatches` does nothing and simply passes all CachedBatch elements along.

Note	spark.sql.inMemoryColumnarStorage.partitionPruning internal configuration property is enabled by default.

Note	`filteredCachedBatches` is used exclusively when `InMemoryTableScanExec` is requested for the inputRDD internal property.

`statsFor` Internal Method



statsFor(a: Attribute)

statsFor(a: Attribute)

statsFor…FIXME

Note	`statsFor` is used when…FIXME

`createAndDecompressColumn` Internal Method



createAndDecompressColumn(cachedColumnarBatch: CachedBatch): ColumnarBatch

createAndDecompressColumn(cachedColumnarBatch: CachedBatch): ColumnarBatch

createAndDecompressColumn takes the number of rows in the input CachedBatch.

createAndDecompressColumn requests OffHeapColumnVector or OnHeapColumnVector to allocate column vectors (with the number of rows and columnarBatchSchema) per the spark.sql.columnVector.offheap.enabled internal configuration flag, i.e. true or false, respectively.

Note	spark.sql.columnVector.offheap.enabled internal configuration flag is disabled by default which means that OnHeapColumnVector is used.

createAndDecompressColumn creates a ColumnarBatch for the allocated column vectors (as an array of ColumnVector).

createAndDecompressColumn sets the number of rows in the columnar batch.

For every Attribute createAndDecompressColumn requests ColumnAccessor to decompress the column.

createAndDecompressColumn registers a callback to be executed on a task completion that will close the ColumnarBatch.

In the end, createAndDecompressColumn returns the ColumnarBatch.

Note	`createAndDecompressColumn` is used exclusively when `InMemoryTableScanExec` is requested for the input RDD of internal rows.

Creating Input RDD of Internal Rows — `inputRDD` Internal Property



inputRDD: RDD[InternalRow]

inputRDD: RDD[InternalRow]

Note	`inputRDD` is a Scala lazy value which is computed once when accessed and cached afterwards.

inputRDD firstly applies partition batch pruning to cached column buffers (and creates a filtered cached batches as a RDD[CachedBatch]).

With supportsBatch flag on, inputRDD finishes with a new MapPartitionsRDD (using RDD.map) by createAndDecompressColumn on all cached columnar batches.

Caution

Show examples of supportsBatch enabled and disabled



// Demo: A MapPartitionsRDD in the RDD lineage
val q = spark.range(4).cache
val plan = q.queryExecution.executedPlan
import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
val inmemoryScan = plan.collectFirst { case exec: InMemoryTableScanExec => exec }.get

// supportsBatch flag is on since the schema is a single column of longs
assert(inmemoryScan.supportsBatch)

val rdd = inmemoryScan.inputRDDs.head
scala> rdd.toDebugString
res2: String =
(8) MapPartitionsRDD[5] at inputRDDs at <console>:27 []
 |  MapPartitionsRDD[4] at inputRDDs at <console>:27 []
 |  *(1) Range (0, 4, step=1, splits=8)
 MapPartitionsRDD[3] at cache at <console>:23 []
 |  MapPartitionsRDD[2] at cache at <console>:23 []
 |  MapPartitionsRDD[1] at cache at <console>:23 []
 |  ParallelCollectionRDD[0] at cache at <console>:23 []

// Demo: A MapPartitionsRDD in the RDD lineage

val q = spark.range(4).cache

val plan = q.queryExecution.executedPlan

import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec

val inmemoryScan = plan.collectFirst { case exec: InMemoryTableScanExec => exec }.get

// supportsBatch flag is on since the schema is a single column of longs

assert(inmemoryScan.supportsBatch)

val rdd = inmemoryScan.inputRDDs.head

scala> rdd.toDebugString

res2: String =

(8) MapPartitionsRDD[5] at inputRDDs at <console>:27 []

| MapPartitionsRDD[4] at inputRDDs at <console>:27 []

| *(1) Range (0, 4, step=1, splits=8)

MapPartitionsRDD[3] at cache at <console>:23 []

| MapPartitionsRDD[2] at cache at <console>:23 []

| MapPartitionsRDD[1] at cache at <console>:23 []

| ParallelCollectionRDD[0] at cache at <console>:23 []

With supportsBatch flag off, inputRDD firstly applies partition batch pruning to cached column buffers (and creates a filtered cached batches as a RDD[CachedBatch]).

Note	Indeed. `inputRDD` applies partition batch pruning to cached column buffers (and creates a filtered cached batches as a `RDD[CachedBatch]`) twice which seems unnecessary.

In the end, inputRDD creates a new MapPartitionsRDD (using RDD.map) with a ColumnarIterator applied to all cached columnar batches that is created as follows:

For every CachedBatch in the partition iterator adds the total number of rows in the batch to numOutputRows SQL metric
Requests GenerateColumnAccessor to generate the Java code for a ColumnarIterator to perform expression evaluation for the given column types.
Requests ColumnarIterator to initialize



// Demo: A MapPartitionsRDD in the RDD lineage (supportsBatch flag off)
import java.sql.Date
import java.time.LocalDate
val q = Seq(Date.valueOf(LocalDate.now)).toDF("date").cache
val plan = q.queryExecution.executedPlan

import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
val inmemoryScan = plan.collectFirst { case exec: InMemoryTableScanExec => exec }.get

// supportsBatch flag is off since the schema uses java.sql.Date
assert(inmemoryScan.supportsBatch == false)

val rdd = inmemoryScan.inputRDDs.head
scala> rdd.toDebugString
res2: String =
(1) MapPartitionsRDD[12] at inputRDDs at <console>:28 []
 |  MapPartitionsRDD[11] at inputRDDs at <console>:28 []
 |  LocalTableScan [date#15]
 MapPartitionsRDD[9] at cache at <console>:25 []
 |  MapPartitionsRDD[8] at cache at <console>:25 []
 |  ParallelCollectionRDD[7] at cache at <console>:25 []

// Demo: A MapPartitionsRDD in the RDD lineage (supportsBatch flag off)

import java.sql.Date

import java.time.LocalDate

val q = Seq(Date.valueOf(LocalDate.now)).toDF("date").cache

val plan = q.queryExecution.executedPlan

import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec

val inmemoryScan = plan.collectFirst { case exec: InMemoryTableScanExec => exec }.get

// supportsBatch flag is off since the schema uses java.sql.Date

assert(inmemoryScan.supportsBatch == false)

val rdd = inmemoryScan.inputRDDs.head

scala> rdd.toDebugString

res2: String =

(1) MapPartitionsRDD[12] at inputRDDs at <console>:28 []

| MapPartitionsRDD[11] at inputRDDs at <console>:28 []

| LocalTableScan [date#15]

MapPartitionsRDD[9] at cache at <console>:25 []

| MapPartitionsRDD[8] at cache at <console>:25 []

| ParallelCollectionRDD[7] at cache at <console>:25 []

Note	`inputRDD` is used when `InMemoryTableScanExec` is requested for the input RDDs and to execute.

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute branches off per supportsBatch flag.

With supportsBatch flag on, doExecute creates a WholeStageCodegenExec (with the InMemoryTableScanExec physical operator as the child and codegenStageId as 0) and requests it to execute.

Otherwise, when supportsBatch flag is off, doExecute simply gives the input RDD of internal rows.

`buildFilter` Property



buildFilter: PartialFunction[Expression, Expression]

buildFilter: PartialFunction[Expression, Expression]

Note	`buildFilter` is a Scala lazy value which is computed once when accessed and cached afterwards.

buildFilter is a Scala PartialFunction that accepts an Expression and produces an Expression, i.e. PartialFunction[Expression, Expression].

Table 3. buildFilter’s Expressions
Input Expression	Description
`And`
`Or`
`EqualTo`
`EqualNullSafe`
`LessThan`
`LessThanOrEqual`
`GreaterThan`
`GreaterThanOrEqual`
`IsNull`
`IsNotNull`
`In` with a non-empty list of Literal expressions	For every `Literal` expression in the expression list, `buildFilter` creates an `And` expression with the lower and upper bounds of the partition statistics for the attribute and the `Literal`. In the end, `buildFilter` joins the `And` expressions with `Or` expressions.

Note	`buildFilter` is used exclusively when `InMemoryTableScanExec` is requested for partitionFilters.

`innerChildren` Method



innerChildren: Seq[QueryPlan[_]]

innerChildren: Seq[QueryPlan[_]]

Note	`innerChildren` is part of QueryPlan Contract to…FIXME.

innerChildren…FIXME

HiveTableScanExec

2013-01-18admin阅读(1903)

HiveTableScanExec Leaf Physical Operator

HiveTableScanExec is a leaf physical operator that represents a HiveTableRelation logical operator at execution time.

HiveTableScanExec is created exclusively when HiveTableScans execution planning strategy plans a HiveTableRelation logical operator (i.e. is executed on a logical query plan with a HiveTableRelation logical operator).

Table 1. HiveTableScanExec’s Performance Metrics
Key	Name (in web UI)	Description
`numOutputRows`	number of output rows

Table 2. HiveTableScanExec’s Internal Properties (e.g. Registries, Counters and Flags)
Name	Description
`hiveQlTable`	Hive’s `Table` metadata (converted from the CatalogTable of the HiveTableRelation) Used when `HiveTableScanExec` is requested for the tableDesc, rawPartitions and is executed
`rawPartitions`
`tableDesc`	Hive’s `TableDesc`

Creating HiveTableScanExec Instance

HiveTableScanExec takes the following when created:

Requested attributes
HiveTableRelation
Partition pruning predicate expression
SparkSession

HiveTableScanExec initializes the internal registries and counters.

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute…FIXME

HashAggregateExec

2013-01-17admin阅读(1536)

HashAggregateExec Aggregate Physical Operator for Hash-Based Aggregation

HashAggregateExec is a unary physical operator (i.e. with one child physical operator) for hash-based aggregation that is created (indirectly through AggUtils.createAggregate) when:

Aggregation execution planning strategy selects the aggregate physical operator for an Aggregate logical operator
Structured Streaming’s StatefulAggregationStrategy strategy creates plan for streaming EventTimeWatermark or Aggregate logical operators

Note	`HashAggregateExec` is the preferred aggregate physical operator for Aggregation execution planning strategy (over `ObjectHashAggregateExec` and `SortAggregateExec`).

HashAggregateExec supports Java code generation (aka codegen).

HashAggregateExec uses TungstenAggregationIterator (to iterate over UnsafeRows in partitions) when executed.



val q = spark.range(10).
  groupBy('id % 2 as "group").
  agg(sum("id") as "sum")

// HashAggregateExec selected due to:
// 1. sum uses mutable types for aggregate expression
// 2. just a single id column reference of LongType data type
scala> q.explain
== Physical Plan ==
*HashAggregate(keys=[(id#0L % 2)#12L], functions=[sum(id#0L)])
+- Exchange hashpartitioning((id#0L % 2)#12L, 200)
   +- *HashAggregate(keys=[(id#0L % 2) AS (id#0L % 2)#12L], functions=[partial_sum(id#0L)])
      +- *Range (0, 10, step=1, splits=8)

val execPlan = q.queryExecution.sparkPlan
scala> println(execPlan.numberedTreeString)
00 HashAggregate(keys=[(id#0L % 2)#15L], functions=[sum(id#0L)], output=[group#3L, sum#7L])
01 +- HashAggregate(keys=[(id#0L % 2) AS (id#0L % 2)#15L], functions=[partial_sum(id#0L)], output=[(id#0L % 2)#15L, sum#17L])
02    +- Range (0, 10, step=1, splits=8)

// Going low level...watch your steps :)

import q.queryExecution.optimizedPlan
import org.apache.spark.sql.catalyst.plans.logical.Aggregate
val aggLog = optimizedPlan.asInstanceOf[Aggregate]
import org.apache.spark.sql.catalyst.planning.PhysicalAggregation
import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
val aggregateExpressions: Seq[AggregateExpression] = PhysicalAggregation.unapply(aggLog).get._2
val aggregateBufferAttributes = aggregateExpressions.
 flatMap(_.aggregateFunction.aggBufferAttributes)
import org.apache.spark.sql.execution.aggregate.HashAggregateExec
// that's the exact reason why HashAggregateExec was selected
// Aggregation execution planning strategy prefers HashAggregateExec
scala> val useHash = HashAggregateExec.supportsAggregate(aggregateBufferAttributes)
useHash: Boolean = true

val hashAggExec = execPlan.asInstanceOf[HashAggregateExec]
scala> println(execPlan.numberedTreeString)
00 HashAggregate(keys=[(id#0L % 2)#15L], functions=[sum(id#0L)], output=[group#3L, sum#7L])
01 +- HashAggregate(keys=[(id#0L % 2) AS (id#0L % 2)#15L], functions=[partial_sum(id#0L)], output=[(id#0L % 2)#15L, sum#17L])
02    +- Range (0, 10, step=1, splits=8)

val hashAggExecRDD = hashAggExec.execute // <-- calls doExecute
scala> println(hashAggExecRDD.toDebugString)
(8) MapPartitionsRDD[3] at execute at <console>:30 []
 |  MapPartitionsRDD[2] at execute at <console>:30 []
 |  MapPartitionsRDD[1] at execute at <console>:30 []
 |  ParallelCollectionRDD[0] at execute at <console>:30 []

val q = spark.range(10).

groupBy('id % 2 as "group").

agg(sum("id") as "sum")

// HashAggregateExec selected due to:

// 1. sum uses mutable types for aggregate expression

// 2. just a single id column reference of LongType data type

scala> q.explain

== Physical Plan ==

*HashAggregate(keys=[(id#0L % 2)#12L], functions=[sum(id#0L)])

+- Exchange hashpartitioning((id#0L % 2)#12L, 200)

+- *HashAggregate(keys=[(id#0L % 2) AS (id#0L % 2)#12L], functions=[partial_sum(id#0L)])

+- *Range (0, 10, step=1, splits=8)

val execPlan = q.queryExecution.sparkPlan

scala> println(execPlan.numberedTreeString)

00 HashAggregate(keys=[(id#0L % 2)#15L], functions=[sum(id#0L)], output=[group#3L, sum#7L])

01 +- HashAggregate(keys=[(id#0L % 2) AS (id#0L % 2)#15L], functions=[partial_sum(id#0L)], output=[(id#0L % 2)#15L, sum#17L])

02 +- Range (0, 10, step=1, splits=8)

// Going low level...watch your steps :)

import q.queryExecution.optimizedPlan

import org.apache.spark.sql.catalyst.plans.logical.Aggregate

val aggLog = optimizedPlan.asInstanceOf[Aggregate]

import org.apache.spark.sql.catalyst.planning.PhysicalAggregation

import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression

val aggregateExpressions: Seq[AggregateExpression] = PhysicalAggregation.unapply(aggLog).get._2

val aggregateBufferAttributes = aggregateExpressions.

flatMap(_.aggregateFunction.aggBufferAttributes)

import org.apache.spark.sql.execution.aggregate.HashAggregateExec

// that's the exact reason why HashAggregateExec was selected

// Aggregation execution planning strategy prefers HashAggregateExec

scala> val useHash = HashAggregateExec.supportsAggregate(aggregateBufferAttributes)

useHash: Boolean = true

val hashAggExec = execPlan.asInstanceOf[HashAggregateExec]

scala> println(execPlan.numberedTreeString)

00 HashAggregate(keys=[(id#0L % 2)#15L], functions=[sum(id#0L)], output=[group#3L, sum#7L])

01 +- HashAggregate(keys=[(id#0L % 2) AS (id#0L % 2)#15L], functions=[partial_sum(id#0L)], output=[(id#0L % 2)#15L, sum#17L])

02 +- Range (0, 10, step=1, splits=8)

val hashAggExecRDD = hashAggExec.execute // <-- calls doExecute

scala> println(hashAggExecRDD.toDebugString)

(8) MapPartitionsRDD[3] at execute at <console>:30 []

| MapPartitionsRDD[2] at execute at <console>:30 []

| MapPartitionsRDD[1] at execute at <console>:30 []

| ParallelCollectionRDD[0] at execute at <console>:30 []

Key Name (in web UI) Description

aggTime

aggregate time

avgHashProbe

avg hash probe

Average hash map probe per lookup (i.e. numProbes / numKeyLookups)

Note	`numProbes` and `numKeyLookups` are used in BytesToBytesMap append-only hash map for the number of iteration to look up a single key and the number of all the lookups in total, respectively.

numOutputRows

number of output rows

Number of groups (per partition) that (depending on the number of partitions and the side of ShuffleExchangeExec operator) is the number of groups

0 for no input with a grouping expression, e.g. spark.range(0).groupBy($"id").count.show
1 for no grouping expression and no input, e.g. spark.range(0).groupBy().count.show

Tip	Use different number of elements and partitions in `range` operator to observe the difference in `numOutputRows` metric, e.g.



spark.
  range(0, 10, 1, numPartitions = 1).
  groupBy($"id" % 5 as "gid").
  count.
  show

spark.
  range(0, 10, 1, numPartitions = 5).
  groupBy($"id" % 5 as "gid").
  count.
  show

spark.

range(0, 10, 1, numPartitions = 1).

groupBy($"id" % 5 as "gid").

count.

show

spark.

range(0, 10, 1, numPartitions = 5).

groupBy($"id" % 5 as "gid").

count.

show

peakMemory

peak memory

spillSize

spill size

spark sql HashAggregateExec webui details for query.png

Figure 1. HashAggregateExec in web UI (Details for Query)

Table 2. HashAggregateExec’s Properties
Name	Description
`output`	Output schema for the input NamedExpressions

requiredChildDistribution varies per the input required child distribution expressions.

Table 3. HashAggregateExec’s Required Child Output Distributions
requiredChildDistributionExpressions	Distribution
Defined, but empty	AllTuples
Non-empty	ClusteredDistribution for `exprs`
Undefined (`None`)	UnspecifiedDistribution

Note

requiredChildDistributionExpressions is exactly requiredChildDistributionExpressions from AggUtils.createAggregate and is undefined by default.

(No distinct in aggregation) requiredChildDistributionExpressions is undefined when HashAggregateExec is created for partial aggregations (i.e. mode is Partial for aggregate expressions).

requiredChildDistributionExpressions is defined, but could possibly be empty, when HashAggregateExec is created for final aggregations (i.e. mode is Final for aggregate expressions).

(one distinct in aggregation) requiredChildDistributionExpressions is undefined when HashAggregateExec is created for partial aggregations (i.e. mode is Partial for aggregate expressions) with one distinct in aggregation.

requiredChildDistributionExpressions is defined, but could possibly be empty, when HashAggregateExec is created for partial merge aggregations (i.e. mode is PartialMerge for aggregate expressions).

FIXME for the following two cases in aggregation with one distinct.

Note	The prefix for variable names for `HashAggregateExec` operators in CodegenSupport-generated code is agg.

Table 4. HashAggregateExec’s Internal Properties (e.g. Registries, Counters and Flags)
Name	Description
`aggregateBufferAttributes`	All the AttributeReferences of the AggregateFunctions of the AggregateExpressions
`testFallbackStartsAt`	Optional pair of numbers for controlled fall-back to a sort-based aggregation when the hash-based approach is unable to acquire enough memory.
`declFunctions`	DeclarativeAggregate expressions (from the AggregateFunctions of the AggregateExpressions)
`bufferSchema`	StructType built from the aggregateBufferAttributes
`groupingKeySchema`	StructType built from the groupingAttributes
`groupingAttributes`	Attributes of the groupingExpressions

Note

HashAggregateExec uses TungstenAggregationIterator that can (theoretically) switch to a sort-based aggregation when the hash-based approach is unable to acquire enough memory.

See testFallbackStartsAt internal property and spark.sql.TungstenAggregate.testFallbackStartsAt Spark property.

Search logs for the following INFO message to know whether the switch has happened.



INFO TungstenAggregationIterator: falling back to sort based aggregation.

INFO TungstenAggregationIterator: falling back to sort based aggregation.

`finishAggregate` Method



finishAggregate(
  hashMap: UnsafeFixedWidthAggregationMap,
  sorter: UnsafeKVExternalSorter,
  peakMemory: SQLMetric,
  spillSize: SQLMetric,
  avgHashProbe: SQLMetric): KVIterator[UnsafeRow, UnsafeRow]

finishAggregate(

hashMap: UnsafeFixedWidthAggregationMap,

sorter: UnsafeKVExternalSorter,

peakMemory: SQLMetric,

spillSize: SQLMetric,

avgHashProbe: SQLMetric): KVIterator[UnsafeRow, UnsafeRow]

finishAggregate…FIXME

Note	`finishAggregate` is used exclusively when `HashAggregateExec` is requested to generate the Java code for doProduceWithKeys.

Generating Java Source Code for Whole-Stage Consume Path with Grouping Keys — `doConsumeWithKeys` Internal Method



doConsumeWithKeys(ctx: CodegenContext, input: Seq[ExprCode]): String

doConsumeWithKeys(ctx: CodegenContext, input: Seq[ExprCode]): String

doConsumeWithKeys…FIXME

Note	`doConsumeWithKeys` is used exclusively when `HashAggregateExec` is requested to generate the Java code for whole-stage consume path (with named expressions for the grouping keys).

Generating Java Source Code for Whole-Stage Consume Path without Grouping Keys — `doConsumeWithoutKeys` Internal Method



doConsumeWithoutKeys(ctx: CodegenContext, input: Seq[ExprCode]): String

doConsumeWithoutKeys(ctx: CodegenContext, input: Seq[ExprCode]): String

doConsumeWithoutKeys…FIXME

Note	`doConsumeWithoutKeys` is used exclusively when `HashAggregateExec` is requested to generate the Java code for whole-stage consume path (with no named expressions for the grouping keys).

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — `doConsume` Method



doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

Note	`doConsume` is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.

doConsume executes doConsumeWithoutKeys when no named expressions for the grouping keys were specified for the HashAggregateExec or doConsumeWithKeys otherwise.

Generating Java Source Code For “produce” Path (In Whole-Stage Code Generation) — `doProduceWithKeys` Internal Method



doProduceWithKeys(ctx: CodegenContext): String

doProduceWithKeys(ctx: CodegenContext): String

doProduceWithKeys…FIXME

Note	`doProduceWithKeys` is used exclusively when `HashAggregateExec` physical operator is requested to generate the Java source code for “produce” path in whole-stage code generation (when there are no groupingExpressions).

`doProduceWithoutKeys` Internal Method



doProduceWithoutKeys(ctx: CodegenContext): String

doProduceWithoutKeys(ctx: CodegenContext): String

doProduceWithoutKeys…FIXME

Note	`doProduceWithoutKeys` is used exclusively when `HashAggregateExec` physical operator is requested to generate the Java source code for “produce” path in whole-stage code generation.

`generateResultFunction` Internal Method



generateResultFunction(ctx: CodegenContext): String

generateResultFunction(ctx: CodegenContext): String

generateResultFunction…FIXME

Note	`generateResultFunction` is used exclusively when `HashAggregateExec` physical operator is requested to doProduceWithKeys (when `HashAggregateExec` physical operator is requested to generate the Java source code for “produce” path in whole-stage code generation)

`supportsAggregate` Object Method



supportsAggregate(aggregateBufferAttributes: Seq[Attribute]): Boolean

supportsAggregate(aggregateBufferAttributes: Seq[Attribute]): Boolean

supportsAggregate firstly creates the schema (from the input aggregation buffer attributes) and requests UnsafeFixedWidthAggregationMap to supportsAggregationBufferSchema (i.e. the schema uses mutable field data types only that have fixed length and can be mutated in place in an UnsafeRow).

Note	`supportsAggregate` is used when: `AggUtils` is requested to creates an aggregate physical operator given aggregate expressions `HashAggregateExec` physical operator is created (to assert that the aggregateBufferAttributes are supported)

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute requests the child physical operator to execute (that triggers physical query planning and generates an RDD[InternalRow]) and transforms it by executing the following function on internal rows per partition with index (using RDD.mapPartitionsWithIndex that creates another RDD):

Records the start execution time (beforeAgg)
Requests the Iterator[InternalRow] (from executing the child physical operator) for the next element
1. If there is no input (an empty partition), but there are grouping keys used, doExecute simply returns an empty iterator
2. Otherwise, doExecute creates a TungstenAggregationIterator and branches off per whether there are rows to process and the grouping keys.

For empty partitions and no grouping keys, doExecute increments the numOutputRows metric and requests the TungstenAggregationIterator to create a single UnsafeRow as the only element of the result iterator.

For non-empty partitions or there are grouping keys used, doExecute returns the TungstenAggregationIterator.

In the end, doExecute calculates the aggTime metric and returns an Iterator[UnsafeRow] that can be as follows:

Empty
A single-element Iterator[UnsafeRow] with the single UnsafeRow
The TungstenAggregationIterator

Note	The numOutputRows, peakMemory, spillSize and avgHashProbe metrics are used exclusively to create the TungstenAggregationIterator.

Note

doExecute (by RDD.mapPartitionsWithIndex transformation) adds a new MapPartitionsRDD to the RDD lineage. Use RDD.toDebugString to see the additional MapPartitionsRDD.



val ids = spark.range(1)
scala> println(ids.queryExecution.toRdd.toDebugString)
(8) MapPartitionsRDD[12] at toRdd at <console>:29 []
 |  MapPartitionsRDD[11] at toRdd at <console>:29 []
 |  ParallelCollectionRDD[10] at toRdd at <console>:29 []

// Use groupBy that gives HashAggregateExec operator
val q = ids.groupBy('id).count
scala> q.explain
== Physical Plan ==
*(2) HashAggregate(keys=[id#30L], functions=[count(1)])
+- Exchange hashpartitioning(id#30L, 200)
   +- *(1) HashAggregate(keys=[id#30L], functions=[partial_count(1)])
      +- *(1) Range (0, 1, step=1, splits=8)

val rdd = q.queryExecution.toRdd
scala> println(rdd.toDebugString)
(200) MapPartitionsRDD[18] at toRdd at <console>:28 []
  |   ShuffledRowRDD[17] at toRdd at <console>:28 []
  +-(8) MapPartitionsRDD[16] at toRdd at <console>:28 []
     |  MapPartitionsRDD[15] at toRdd at <console>:28 []
     |  MapPartitionsRDD[14] at toRdd at <console>:28 []
     |  ParallelCollectionRDD[13] at toRdd at <console>:28 []

val ids = spark.range(1)

scala> println(ids.queryExecution.toRdd.toDebugString)

(8) MapPartitionsRDD[12] at toRdd at <console>:29 []

| MapPartitionsRDD[11] at toRdd at <console>:29 []

| ParallelCollectionRDD[10] at toRdd at <console>:29 []

// Use groupBy that gives HashAggregateExec operator

val q = ids.groupBy('id).count

scala> q.explain

== Physical Plan ==

*(2) HashAggregate(keys=[id#30L], functions=[count(1)])

+- Exchange hashpartitioning(id#30L, 200)

+- *(1) HashAggregate(keys=[id#30L], functions=[partial_count(1)])

+- *(1) Range (0, 1, step=1, splits=8)

val rdd = q.queryExecution.toRdd

scala> println(rdd.toDebugString)

(200) MapPartitionsRDD[18] at toRdd at <console>:28 []

| ShuffledRowRDD[17] at toRdd at <console>:28 []

+-(8) MapPartitionsRDD[16] at toRdd at <console>:28 []

| MapPartitionsRDD[15] at toRdd at <console>:28 []

| MapPartitionsRDD[14] at toRdd at <console>:28 []

| ParallelCollectionRDD[13] at toRdd at <console>:28 []

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — `doProduce` Method



doProduce(ctx: CodegenContext): String

doProduce(ctx: CodegenContext): String

Note	`doProduce` is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.

doProduce executes doProduceWithoutKeys when no named expressions for the grouping keys were specified for the HashAggregateExec or doProduceWithKeys otherwise.

Creating HashAggregateExec Instance

HashAggregateExec takes the following when created:

Required child distribution expressions
Named expressions for grouping keys
AggregateExpressions
Aggregate attributes
Initial input buffer offset
Output named expressions
Child physical plan

HashAggregateExec initializes the internal registries and counters.

Creating UnsafeFixedWidthAggregationMap Instance — `createHashMap` Method



createHashMap(): UnsafeFixedWidthAggregationMap

createHashMap(): UnsafeFixedWidthAggregationMap

createHashMap creates a UnsafeFixedWidthAggregationMap (with the empty aggregation buffer, the bufferSchema, the groupingKeySchema, the current TaskMemoryManager, 1024 * 16 initial capacity and the page size of the TaskMemoryManager)

Note	`createHashMap` is used exclusively when `HashAggregateExec` physical operator is requested to generate the Java source code for “produce” path (in Whole-Stage Code Generation).

GenerateExec

2013-01-16admin阅读(1279)

GenerateExec Unary Physical Operator

GenerateExec is a unary physical operator (i.e. with one child physical operator) that is created exclusively when BasicOperators execution planning strategy is requested to resolve a Generate logical operator.



val nums = Seq((0 to 4).toArray).toDF("nums")
val q = nums.withColumn("explode", explode($"nums"))

scala> q.explain
== Physical Plan ==
Generate explode(nums#3), true, false, [explode#12]
+- LocalTableScan [nums#3]

val sparkPlan = q.queryExecution.executedPlan
import org.apache.spark.sql.execution.GenerateExec
val ge = sparkPlan.asInstanceOf[GenerateExec]

scala> :type ge
org.apache.spark.sql.execution.GenerateExec

val rdd = ge.execute

scala> rdd.toDebugString
res1: String =
(1) MapPartitionsRDD[2] at execute at <console>:26 []
 |  MapPartitionsRDD[1] at execute at <console>:26 []
 |  ParallelCollectionRDD[0] at execute at <console>:26 []

val nums = Seq((0 to 4).toArray).toDF("nums")

val q = nums.withColumn("explode", explode($"nums"))

scala> q.explain

== Physical Plan ==

Generate explode(nums#3), true, false, [explode#12]

+- LocalTableScan [nums#3]

val sparkPlan = q.queryExecution.executedPlan

import org.apache.spark.sql.execution.GenerateExec

val ge = sparkPlan.asInstanceOf[GenerateExec]

scala> :type ge

org.apache.spark.sql.execution.GenerateExec

val rdd = ge.execute

scala> rdd.toDebugString

res1: String =

(1) MapPartitionsRDD[2] at execute at <console>:26 []

| MapPartitionsRDD[1] at execute at <console>:26 []

| ParallelCollectionRDD[0] at execute at <console>:26 []

When executed, GenerateExec executes (aka evaluates) the Generator expression on every row in a RDD partition.

Figure 1. GenerateExec’s Execution — doExecute Method

Note	child physical operator has to support CodegenSupport.

GenerateExec supports Java code generation (aka codegen).

GenerateExec does not support Java code generation (aka whole-stage codegen), i.e. supportCodegen flag is turned off.



scala> :type ge
org.apache.spark.sql.execution.GenerateExec

scala> ge.supportCodegen
res2: Boolean = false

scala> :type ge

org.apache.spark.sql.execution.GenerateExec

scala> ge.supportCodegen

res2: Boolean = false



// Turn spark.sql.codegen.comments on to see comments in the code
// ./bin/spark-shell --conf spark.sql.codegen.comments=true
// inline function gives Inline expression
val q = spark.range(1)
  .selectExpr("inline(array(struct(1, 'a'), struct(2, 'b')))")

scala> q.explain
== Physical Plan ==
Generate inline([[1,a],[2,b]]), false, false, [col1#47, col2#48]
+- *Project
   +- *Range (0, 1, step=1, splits=8)

val sparkPlan = q.queryExecution.executedPlan
import org.apache.spark.sql.execution.GenerateExec
val ge = sparkPlan.asInstanceOf[GenerateExec]

import org.apache.spark.sql.execution.WholeStageCodegenExec
val wsce = ge.child.asInstanceOf[WholeStageCodegenExec]
val (_, code) = wsce.doCodeGen
import org.apache.spark.sql.catalyst.expressions.codegen.CodeFormatter
val formattedCode = CodeFormatter.format(code)
scala> println(formattedCode)
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ /**
 * Codegend pipeline for
 * Project
 * +- Range (0, 1, step=1, splits=8)
 */
/* 006 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 007 */   private Object[] references;
/* 008 */   private scala.collection.Iterator[] inputs;
/* 009 */   private org.apache.spark.sql.execution.metric.SQLMetric range_numOutputRows;
/* 010 */   private boolean range_initRange;
/* 011 */   private long range_number;
/* 012 */   private TaskContext range_taskContext;
/* 013 */   private InputMetrics range_inputMetrics;
/* 014 */   private long range_batchEnd;
/* 015 */   private long range_numElementsTodo;
/* 016 */   private scala.collection.Iterator range_input;
/* 017 */   private UnsafeRow range_result;
/* 018 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder range_holder;
/* 019 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter range_rowWriter;
/* 020 */
/* 021 */   public GeneratedIterator(Object[] references) {
/* 022 */     this.references = references;
/* 023 */   }
/* 024 */
/* 025 */   public void init(int index, scala.collection.Iterator[] inputs) {
/* 026 */     partitionIndex = index;
/* 027 */     this.inputs = inputs;
/* 028 */     range_numOutputRows = (org.apache.spark.sql.execution.metric.SQLMetric) references[0];
/* 029 */     range_initRange = false;
/* 030 */     range_number = 0L;
/* 031 */     range_taskContext = TaskContext.get();
/* 032 */     range_inputMetrics = range_taskContext.taskMetrics().inputMetrics();
/* 033 */     range_batchEnd = 0;
/* 034 */     range_numElementsTodo = 0L;
/* 035 */     range_input = inputs[0];
/* 036 */     range_result = new UnsafeRow(1);
/* 037 */     range_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(range_result, 0);
/* 038 */     range_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(range_holder, 1);
/* 039 */
/* 040 */   }
/* 041 */
/* 042 */   private void initRange(int idx) {
/* 043 */     java.math.BigInteger index = java.math.BigInteger.valueOf(idx);
/* 044 */     java.math.BigInteger numSlice = java.math.BigInteger.valueOf(8L);
/* 045 */     java.math.BigInteger numElement = java.math.BigInteger.valueOf(1L);
/* 046 */     java.math.BigInteger step = java.math.BigInteger.valueOf(1L);
/* 047 */     java.math.BigInteger start = java.math.BigInteger.valueOf(0L);
/* 048 */     long partitionEnd;
/* 049 */
/* 050 */     java.math.BigInteger st = index.multiply(numElement).divide(numSlice).multiply(step).add(start);
/* 051 */     if (st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
/* 052 */       range_number = Long.MAX_VALUE;
/* 053 */     } else if (st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
/* 054 */       range_number = Long.MIN_VALUE;
/* 055 */     } else {
/* 056 */       range_number = st.longValue();
/* 057 */     }
/* 058 */     range_batchEnd = range_number;
/* 059 */
/* 060 */     java.math.BigInteger end = index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice)
/* 061 */     .multiply(step).add(start);
/* 062 */     if (end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
/* 063 */       partitionEnd = Long.MAX_VALUE;
/* 064 */     } else if (end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
/* 065 */       partitionEnd = Long.MIN_VALUE;
/* 066 */     } else {
/* 067 */       partitionEnd = end.longValue();
/* 068 */     }
/* 069 */
/* 070 */     java.math.BigInteger startToEnd = java.math.BigInteger.valueOf(partitionEnd).subtract(
/* 071 */       java.math.BigInteger.valueOf(range_number));
/* 072 */     range_numElementsTodo  = startToEnd.divide(step).longValue();
/* 073 */     if (range_numElementsTodo < 0) {
/* 074 */       range_numElementsTodo = 0;
/* 075 */     } else if (startToEnd.remainder(step).compareTo(java.math.BigInteger.valueOf(0L)) != 0) {
/* 076 */       range_numElementsTodo++;
/* 077 */     }
/* 078 */   }
/* 079 */
/* 080 */   protected void processNext() throws java.io.IOException {
/* 081 */     // PRODUCE: Project
/* 082 */     // PRODUCE: Range (0, 1, step=1, splits=8)
/* 083 */     // initialize Range
/* 084 */     if (!range_initRange) {
/* 085 */       range_initRange = true;
/* 086 */       initRange(partitionIndex);
/* 087 */     }
/* 088 */
/* 089 */     while (true) {
/* 090 */       long range_range = range_batchEnd - range_number;
/* 091 */       if (range_range != 0L) {
/* 092 */         int range_localEnd = (int)(range_range / 1L);
/* 093 */         for (int range_localIdx = 0; range_localIdx < range_localEnd; range_localIdx++) {
/* 094 */           long range_value = ((long)range_localIdx * 1L) + range_number;
/* 095 */
/* 096 */           // CONSUME: Project
/* 097 */           // CONSUME: WholeStageCodegen
/* 098 */           append(unsafeRow);
/* 099 */
/* 100 */           if (shouldStop()) { range_number = range_value + 1L; return; }
/* 101 */         }
/* 102 */         range_number = range_batchEnd;
/* 103 */       }
/* 104 */
/* 105 */       range_taskContext.killTaskIfInterrupted();
/* 106 */
/* 107 */       long range_nextBatchTodo;
/* 108 */       if (range_numElementsTodo > 1000L) {
/* 109 */         range_nextBatchTodo = 1000L;
/* 110 */         range_numElementsTodo -= 1000L;
/* 111 */       } else {
/* 112 */         range_nextBatchTodo = range_numElementsTodo;
/* 113 */         range_numElementsTodo = 0;
/* 114 */         if (range_nextBatchTodo == 0) break;
/* 115 */       }
/* 116 */       range_numOutputRows.add(range_nextBatchTodo);
/* 117 */       range_inputMetrics.incRecordsRead(range_nextBatchTodo);
/* 118 */
/* 119 */       range_batchEnd += range_nextBatchTodo * 1L;
/* 120 */     }
/* 121 */   }
/* 122 */
/* 123 */ }

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

// Turn spark.sql.codegen.comments on to see comments in the code

// ./bin/spark-shell --conf spark.sql.codegen.comments=true

// inline function gives Inline expression

val q = spark.range(1)

.selectExpr("inline(array(struct(1, 'a'), struct(2, 'b')))")

scala> q.explain

== Physical Plan ==

Generate inline([[1,a],[2,b]]), false, false, [col1#47, col2#48]

+- *Project

+- *Range (0, 1, step=1, splits=8)

val sparkPlan = q.queryExecution.executedPlan

import org.apache.spark.sql.execution.GenerateExec

val ge = sparkPlan.asInstanceOf[GenerateExec]

import org.apache.spark.sql.execution.WholeStageCodegenExec

val wsce = ge.child.asInstanceOf[WholeStageCodegenExec]

val (_, code) = wsce.doCodeGen

import org.apache.spark.sql.catalyst.expressions.codegen.CodeFormatter

val formattedCode = CodeFormatter.format(code)

scala> println(formattedCode)

/* 001 */ public Object generate(Object[] references) {

/* 002 */ return new GeneratedIterator(references);

/* 003 */ }

/* 004 */

/* 005 */ /**

* Codegend pipeline for

* Project

* +- Range (0, 1, step=1, splits=8)

/* 006 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {

/* 007 */ private Object[] references;

/* 008 */ private scala.collection.Iterator[] inputs;

/* 009 */ private org.apache.spark.sql.execution.metric.SQLMetric range_numOutputRows;

/* 010 */ private boolean range_initRange;

/* 011 */ private long range_number;

/* 012 */ private TaskContext range_taskContext;

/* 013 */ private InputMetrics range_inputMetrics;

/* 014 */ private long range_batchEnd;

/* 015 */ private long range_numElementsTodo;

/* 016 */ private scala.collection.Iterator range_input;

/* 017 */ private UnsafeRow range_result;

/* 018 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder range_holder;

/* 019 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter range_rowWriter;

/* 020 */

/* 021 */ public GeneratedIterator(Object[] references) {

/* 022 */ this.references = references;

/* 023 */ }

/* 024 */

/* 025 */ public void init(int index, scala.collection.Iterator[] inputs) {

/* 026 */ partitionIndex = index;

/* 027 */ this.inputs = inputs;

/* 028 */ range_numOutputRows = (org.apache.spark.sql.execution.metric.SQLMetric) references[0];

/* 029 */ range_initRange = false;

/* 030 */ range_number = 0L;

/* 031 */ range_taskContext = TaskContext.get();

/* 032 */ range_inputMetrics = range_taskContext.taskMetrics().inputMetrics();

/* 033 */ range_batchEnd = 0;

/* 034 */ range_numElementsTodo = 0L;

/* 035 */ range_input = inputs[0];

/* 036 */ range_result = new UnsafeRow(1);

/* 037 */ range_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(range_result, 0);

/* 038 */ range_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(range_holder, 1);

/* 039 */

/* 040 */ }

/* 041 */

/* 042 */ private void initRange(int idx) {

/* 043 */ java.math.BigInteger index = java.math.BigInteger.valueOf(idx);

/* 044 */ java.math.BigInteger numSlice = java.math.BigInteger.valueOf(8L);

/* 045 */ java.math.BigInteger numElement = java.math.BigInteger.valueOf(1L);

/* 046 */ java.math.BigInteger step = java.math.BigInteger.valueOf(1L);

/* 047 */ java.math.BigInteger start = java.math.BigInteger.valueOf(0L);

/* 048 */ long partitionEnd;

/* 049 */

/* 050 */ java.math.BigInteger st = index.multiply(numElement).divide(numSlice).multiply(step).add(start);

/* 051 */ if (st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {

/* 052 */ range_number = Long.MAX_VALUE;

/* 053 */ } else if (st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {

/* 054 */ range_number = Long.MIN_VALUE;

/* 055 */ } else {

/* 056 */ range_number = st.longValue();

/* 057 */ }

/* 058 */ range_batchEnd = range_number;

/* 059 */

/* 060 */ java.math.BigInteger end = index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice)

/* 061 */ .multiply(step).add(start);

/* 062 */ if (end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {

/* 063 */ partitionEnd = Long.MAX_VALUE;

/* 064 */ } else if (end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {

/* 065 */ partitionEnd = Long.MIN_VALUE;

/* 066 */ } else {

/* 067 */ partitionEnd = end.longValue();

/* 068 */ }

/* 069 */

/* 070 */ java.math.BigInteger startToEnd = java.math.BigInteger.valueOf(partitionEnd).subtract(

/* 071 */ java.math.BigInteger.valueOf(range_number));

/* 072 */ range_numElementsTodo = startToEnd.divide(step).longValue();

/* 073 */ if (range_numElementsTodo < 0) {

/* 074 */ range_numElementsTodo = 0;

/* 075 */ } else if (startToEnd.remainder(step).compareTo(java.math.BigInteger.valueOf(0L)) != 0) {

/* 076 */ range_numElementsTodo++;

/* 077 */ }

/* 078 */ }

/* 079 */

/* 080 */ protected void processNext() throws java.io.IOException {

/* 081 */ // PRODUCE: Project

/* 082 */ // PRODUCE: Range (0, 1, step=1, splits=8)

/* 083 */ // initialize Range

/* 084 */ if (!range_initRange) {

/* 085 */ range_initRange = true;

/* 086 */ initRange(partitionIndex);

/* 087 */ }

/* 088 */

/* 089 */ while (true) {

/* 090 */ long range_range = range_batchEnd - range_number;

/* 091 */ if (range_range != 0L) {

/* 092 */ int range_localEnd = (int)(range_range / 1L);

/* 093 */ for (int range_localIdx = 0; range_localIdx < range_localEnd; range_localIdx++) {

/* 094 */ long range_value = ((long)range_localIdx * 1L) + range_number;

/* 095 */

/* 096 */ // CONSUME: Project

/* 097 */ // CONSUME: WholeStageCodegen

/* 098 */ append(unsafeRow);

/* 099 */

/* 100 */ if (shouldStop()) { range_number = range_value + 1L; return; }

/* 101 */ }

/* 102 */ range_number = range_batchEnd;

/* 103 */ }

/* 104 */

/* 105 */ range_taskContext.killTaskIfInterrupted();

/* 106 */

/* 107 */ long range_nextBatchTodo;

/* 108 */ if (range_numElementsTodo > 1000L) {

/* 109 */ range_nextBatchTodo = 1000L;

/* 110 */ range_numElementsTodo -= 1000L;

/* 111 */ } else {

/* 112 */ range_nextBatchTodo = range_numElementsTodo;

/* 113 */ range_numElementsTodo = 0;

/* 114 */ if (range_nextBatchTodo == 0) break;

/* 115 */ }

/* 116 */ range_numOutputRows.add(range_nextBatchTodo);

/* 117 */ range_inputMetrics.incRecordsRead(range_nextBatchTodo);

/* 118 */

/* 119 */ range_batchEnd += range_nextBatchTodo * 1L;

/* 120 */ }

/* 121 */ }

/* 122 */

/* 123 */ }

The output schema of a GenerateExec is…FIXME

Table 1. GenerateExec’s Performance Metrics
Key	Name (in web UI)	Description
`numOutputRows`	number of output rows

spark sql GenerateExec webui details for query.png

Figure 2. GenerateExec in web UI (Details for Query)

producedAttributes…FIXME

outputPartitioning…FIXME

boundGenerator…FIXME

GenerateExec gives child‘s input RDDs (when WholeStageCodegenExec is executed).

GenerateExec requires that…FIXME

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — `doProduce` Method



doProduce(ctx: CodegenContext): String

doProduce(ctx: CodegenContext): String

Note	`doProduce` is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.

doProduce…FIXME

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — `doConsume` Method



doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

Note	`doConsume` is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.

doConsume…FIXME

`codeGenCollection` Internal Method



codeGenCollection(
  ctx: CodegenContext,
  e: CollectionGenerator,
  input: Seq[ExprCode],
  row: ExprCode): String

codeGenCollection(

ctx: CodegenContext,

e: CollectionGenerator,

input: Seq[ExprCode],

row: ExprCode): String

codeGenCollection…FIXME

Note	`codeGenCollection` is used exclusively when `GenerateExec` is requested to generate the Java code for the “consume” path in whole-stage code generation (when Generator is a CollectionGenerator).

`codeGenTraversableOnce` Internal Method



codeGenTraversableOnce(
  ctx: CodegenContext,
  e: Expression,
  input: Seq[ExprCode],
  row: ExprCode): String

codeGenTraversableOnce(

ctx: CodegenContext,

e: Expression,

input: Seq[ExprCode],

row: ExprCode): String

codeGenTraversableOnce…FIXME

Note	`codeGenTraversableOnce` is used exclusively when `GenerateExec` is requested to generate the Java code for the consume path in whole-stage code generation (when Generator is not a CollectionGenerator).

`codeGenAccessor` Internal Method



codeGenAccessor(
  ctx: CodegenContext,
  source: String,
  name: String,
  index: String,
  dt: DataType,
  nullable: Boolean,
  initialChecks: Seq[String]): ExprCode

codeGenAccessor(

ctx: CodegenContext,

source: String,

name: String,

index: String,

dt: DataType,

nullable: Boolean,

initialChecks: Seq[String]): ExprCode

codeGenAccessor…FIXME

Note	`codeGenAccessor` is used…FIXME

Creating GenerateExec Instance

GenerateExec takes the following when created:

Generator
join flag
outer flag
Generator’s output schema
Child physical operator

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute…FIXME

FilterExec

2013-01-15admin阅读(1635)

FilterExec Unary Physical Operator

FilterExec is a unary physical operator (i.e. with one child physical operator) that represents Filter and TypedFilter unary logical operators at execution.

FilterExec supports Java code generation (aka codegen) as follows:

usedInputs is an empty AttributeSet (to defer evaluation of attribute expressions until they are actually used, i.e. in the generated Java source code for consume path)
Uses whatever the child physical operator uses for the input RDDs
Generates a Java source code for the produce and consume paths in whole-stage code generation

FilterExec is created when:

BasicOperators execution planning strategy is executed (and plans Filter and TypedFilter unary logical operators)
HiveTableScans execution planning strategy is executed (and plans HiveTableRelation leaf logical operators and requests SparkPlanner to pruneFilterProject)
InMemoryScans execution planning strategy is executed (and plans InMemoryRelation leaf logical operators and requests SparkPlanner to pruneFilterProject)
DataSourceStrategy execution planning strategy is requested to create a RowDataSourceScanExec physical operator (possibly under FilterExec and ProjectExec operators)
FileSourceStrategy execution planning strategy is executed (on LogicalRelations with a HadoopFsRelation)
ExtractPythonUDFs physical query optimization is requested to trySplitFilter

Table 1. FilterExec’s Performance Metrics
Key	Name (in web UI)	Description
`numOutputRows`	number of output rows

spark sql FilterExec webui details for query.png

Figure 1. FilterExec in web UI (Details for Query)

FilterExec uses whatever the child physical operator uses for the input RDDs, the outputOrdering and the outputPartitioning.

FilterExec uses the PredicateHelper for…FIXME

Table 2. FilterExec’s Internal Properties (e.g. Registries, Counters and Flags)
Name	Description
`notNullAttributes`	FIXME Used when…FIXME
`notNullPreds`	FIXME Used when…FIXME
`otherPreds`	FIXME Used when…FIXME

Creating FilterExec Instance

FilterExec takes the following when created:

Catalyst expression for the filter condition
Child physical operator

FilterExec initializes the internal registries and counters.

`isNullIntolerant` Internal Method



isNullIntolerant(expr: Expression): Boolean

isNullIntolerant(expr: Expression): Boolean

isNullIntolerant…FIXME

Note	`isNullIntolerant` is used when…FIXME

`usedInputs` Method



usedInputs: AttributeSet

usedInputs: AttributeSet

Note	`usedInputs` is part of CodegenSupport Contract to…FIXME.

usedInputs…FIXME

`output` Method



output: Seq[Attribute]

output: Seq[Attribute]

Note	`output` is part of QueryPlan Contract to…FIXME.

output…FIXME

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — `doProduce` Method



doProduce(ctx: CodegenContext): String

doProduce(ctx: CodegenContext): String

Note	`doProduce` is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.

doProduce…FIXME

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — `doConsume` Method



doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

Note	`doConsume` is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.

doConsume creates a new metric term for the numOutputRows metric.

doConsume…FIXME

In the end, doConsume uses consume and FIXME to generate a Java source code (as a plain text) inside a do {…} while(false); code block.



// DEMO Write one

// DEMO Write one

`genPredicate` Internal Method



genPredicate(c: Expression, in: Seq[ExprCode], attrs: Seq[Attribute]): String

genPredicate(c: Expression, in: Seq[ExprCode], attrs: Seq[Attribute]): String

Note	`genPredicate` is an internal method of doConsume.

genPredicate…FIXME

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute executes the child physical operator and creates a new MapPartitionsRDD that does the filtering.



// DEMO Show the RDD lineage with the new MapPartitionsRDD after FilterExec

// DEMO Show the RDD lineage with the new MapPartitionsRDD after FilterExec

Internally, doExecute takes the numOutputRows metric.

In the end, doExecute requests the child physical operator to execute (that triggers physical query planning and generates an RDD[InternalRow]) and transforms it by executing the following function on internal rows per partition with index (using RDD.mapPartitionsWithIndexInternal that creates another RDD):

Creates a partition filter as a new GenPredicate (for the filter condition expression and the output schema of the child physical operator)
Requests the generated partition filter Predicate to initialize (with 0 partition index)
Filters out elements from the partition iterator (Iterator[InternalRow]) by requesting the generated partition filter Predicate to evaluate for every InternalRow
1. Increments the numOutputRows metric for positive evaluations (i.e. that returned true)

Note	`doExecute` (by `RDD.mapPartitionsWithIndexInternal`) adds a new `MapPartitionsRDD` to the RDD lineage. Use `RDD.toDebugString` to see the additional `MapPartitionsRDD`.

FileSourceScanExec

2013-01-14admin阅读(2038)

FileSourceScanExec Leaf Physical Operator

FileSourceScanExec is a leaf physical operator (as a DataSourceScanExec) that represents a scan over collections of files (incl. Hive tables).

FileSourceScanExec is created exclusively for a LogicalRelation logical operator with a HadoopFsRelation when FileSourceStrategy execution planning strategy is executed.



// Create a bucketed data source table
// It is one of the most complex examples of a LogicalRelation with a HadoopFsRelation
val tableName = "bucketed_4_id"
spark
  .range(100)
  .withColumn("part", $"id" % 2)
  .write
  .partitionBy("part")
  .bucketBy(4, "id")
  .sortBy("id")
  .mode("overwrite")
  .saveAsTable(tableName)
val q = spark.table(tableName)
val sparkPlan = q.queryExecution.executedPlan

scala> println(sparkPlan.numberedTreeString)
00 *(1) FileScan parquet default.bucketed_4_id[id#7L,part#8L] Batched: true, Format: Parquet, Location: CatalogFileIndex[file:/Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id], PartitionCount: 2, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>, SelectedBucketsCount: 4 out of 4

import org.apache.spark.sql.execution.FileSourceScanExec
val scan = sparkPlan.collectFirst { case exec: FileSourceScanExec => exec }.get

scala> :type scan
org.apache.spark.sql.execution.FileSourceScanExec

scala> scan.metadata.toSeq.sortBy(_._1).map { case (k, v) => s"$k -> $v" }.foreach(println)
Batched -> true
Format -> Parquet
Location -> CatalogFileIndex[file:/Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id]
PartitionCount -> 2
PartitionFilters -> []
PushedFilters -> []
ReadSchema -> struct<id:bigint>
SelectedBucketsCount -> 4 out of 4

// Create a bucketed data source table

// It is one of the most complex examples of a LogicalRelation with a HadoopFsRelation

val tableName = "bucketed_4_id"

spark

.range(100)

.withColumn("part", $"id" % 2)

.write

.partitionBy("part")

.bucketBy(4, "id")

.sortBy("id")

.mode("overwrite")

.saveAsTable(tableName)

val q = spark.table(tableName)

val sparkPlan = q.queryExecution.executedPlan

scala> println(sparkPlan.numberedTreeString)

00 *(1) FileScan parquet default.bucketed_4_id[id#7L,part#8L] Batched: true, Format: Parquet, Location: CatalogFileIndex[file:/Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id], PartitionCount: 2, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>, SelectedBucketsCount: 4 out of 4

import org.apache.spark.sql.execution.FileSourceScanExec

val scan = sparkPlan.collectFirst { case exec: FileSourceScanExec => exec }.get

scala> :type scan

org.apache.spark.sql.execution.FileSourceScanExec

scala> scan.metadata.toSeq.sortBy(_._1).map { case (k, v) => s"$k -> $v" }.foreach(println)

Batched -> true

Format -> Parquet

Location -> CatalogFileIndex[file:/Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id]

PartitionCount -> 2

PartitionFilters -> []

PushedFilters -> []

ReadSchema -> struct<id:bigint>

SelectedBucketsCount -> 4 out of 4

FileSourceScanExec supports bucket pruning so it only scans the bucket files required for a query.



scala> :type scan
org.apache.spark.sql.execution.FileSourceScanExec

import org.apache.spark.sql.execution.datasources.FileScanRDD
val rdd = scan.inputRDDs.head.asInstanceOf[FileScanRDD]

import org.apache.spark.sql.execution.datasources.FilePartition
val bucketFiles = for {
  FilePartition(bucketId, files) <- rdd.filePartitions
  f <- files
} yield s"Bucket $bucketId => $f"

scala> println(bucketFiles.size)
51

scala> bucketFiles.foreach(println)
Bucket 0 => path: file:///Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id/part=0/part-00004-5301d371-01c3-47d4-bb6b-76c3c94f3699_00000.c000.snappy.parquet, range: 0-423, partition values: [0]
Bucket 0 => path: file:///Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id/part=0/part-00001-5301d371-01c3-47d4-bb6b-76c3c94f3699_00000.c000.snappy.parquet, range: 0-423, partition values: [0]
...
Bucket 3 => path: file:///Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id/part=1/part-00005-5301d371-01c3-47d4-bb6b-76c3c94f3699_00003.c000.snappy.parquet, range: 0-423, partition values: [1]
Bucket 3 => path: file:///Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id/part=1/part-00000-5301d371-01c3-47d4-bb6b-76c3c94f3699_00003.c000.snappy.parquet, range: 0-431, partition values: [1]
Bucket 3 => path: file:///Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id/part=1/part-00007-5301d371-01c3-47d4-bb6b-76c3c94f3699_00003.c000.snappy.parquet, range: 0-423, partition values: [1]

scala> :type scan

org.apache.spark.sql.execution.FileSourceScanExec

import org.apache.spark.sql.execution.datasources.FileScanRDD

val rdd = scan.inputRDDs.head.asInstanceOf[FileScanRDD]

import org.apache.spark.sql.execution.datasources.FilePartition

val bucketFiles = for {

FilePartition(bucketId, files) <- rdd.filePartitions

f <- files

} yield s"Bucket $bucketId => $f"

scala> println(bucketFiles.size)

scala> bucketFiles.foreach(println)

Bucket 0 => path: file:///Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id/part=0/part-00004-5301d371-01c3-47d4-bb6b-76c3c94f3699_00000.c000.snappy.parquet, range: 0-423, partition values: [0]

Bucket 0 => path: file:///Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id/part=0/part-00001-5301d371-01c3-47d4-bb6b-76c3c94f3699_00000.c000.snappy.parquet, range: 0-423, partition values: [0]

...

Bucket 3 => path: file:///Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id/part=1/part-00005-5301d371-01c3-47d4-bb6b-76c3c94f3699_00003.c000.snappy.parquet, range: 0-423, partition values: [1]

Bucket 3 => path: file:///Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id/part=1/part-00000-5301d371-01c3-47d4-bb6b-76c3c94f3699_00003.c000.snappy.parquet, range: 0-431, partition values: [1]

Bucket 3 => path: file:///Users/jacek/dev/oss/spark/spark-warehouse/bucketed_4_id/part=1/part-00007-5301d371-01c3-47d4-bb6b-76c3c94f3699_00003.c000.snappy.parquet, range: 0-423, partition values: [1]

FileSourceScanExec uses a HashPartitioning or the default UnknownPartitioning as the output partitioning scheme.

FileSourceScanExec is a ColumnarBatchScan and supports batch decoding only when the FileFormat (of the HadoopFsRelation) supports it.

FileSourceScanExec always gives the single inputRDD as the only RDD of internal rows (in Whole-Stage Java Code Generation).

FileSourceScanExec supports data source filters that are printed out to the console (at INFO logging level) and available as metadata (e.g. in web UI or explain).



Pushed Filters: [pushedDownFilters]

Pushed Filters: [pushedDownFilters]

Table 1. FileSourceScanExec’s Performance Metrics
Key	Name (in web UI)	Description
`metadataTime`	metadata time (ms)
`numFiles`	number of files
`numOutputRows`	number of output rows
`scanTime`	scan time

As a DataSourceScanExec, FileSourceScanExec uses Scan for the prefix of the node name.



val fileScanExec: FileSourceScanExec = ... // see the example earlier
assert(fileScanExec.nodeName startsWith "Scan")

val fileScanExec: FileSourceScanExec = ... // see the example earlier

assert(fileScanExec.nodeName startsWith "Scan")

spark sql FileSourceScanExec webui query details.png

Figure 1. FileSourceScanExec in web UI (Details for Query)

FileSourceScanExec uses File for nodeNamePrefix (that is used for the simple node description in query plans).



val fileScanExec: FileSourceScanExec = ... // see the example earlier
assert(fileScanExec.nodeNamePrefix == "File")

scala> println(fileScanExec.simpleString)
FileScan csv [id#20,name#21,city#22] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/Users/jacek/dev/oss/datasets/people.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:string,name:string,city:string>

val fileScanExec: FileSourceScanExec = ... // see the example earlier

assert(fileScanExec.nodeNamePrefix == "File")

scala> println(fileScanExec.simpleString)

FileScan csv [id#20,name#21,city#22] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/Users/jacek/dev/oss/datasets/people.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:string,name:string,city:string>

Name Description

metadata



metadata: Map[String, String]

metadata: Map[String, String]

Metadata

Note	`metadata` is part of DataSourceScanExec Contract to..FIXME.

pushedDownFilters

Data source filters that are dataFilters expressions converted to their respective filters

Tip

Enable INFO logging level to see pushedDownFilters printed out to the console.



Pushed Filters: [pushedDownFilters]

Pushed Filters: [pushedDownFilters]

Used when FileSourceScanExec is requested for the metadata and input RDD

Tip

Enable INFO logging level for org.apache.spark.sql.execution.FileSourceScanExec logger to see what happens inside.

Add the following line to conf/log4j.properties:



log4j.logger.org.apache.spark.sql.execution.FileSourceScanExec=INFO

log4j.logger.org.apache.spark.sql.execution.FileSourceScanExec=INFO

Refer to Logging.

Creating RDD for Non-Bucketed Reads — `createNonBucketedReadRDD` Internal Method



createNonBucketedReadRDD(
  readFile: (PartitionedFile) => Iterator[InternalRow],
  selectedPartitions: Seq[PartitionDirectory],
  fsRelation: HadoopFsRelation): RDD[InternalRow]

createNonBucketedReadRDD(

readFile: (PartitionedFile) => Iterator[InternalRow],

selectedPartitions: Seq[PartitionDirectory],

fsRelation: HadoopFsRelation): RDD[InternalRow]

createNonBucketedReadRDD…FIXME

Note	`createNonBucketedReadRDD` is used exclusively when `FileSourceScanExec` physical operator is requested for the inputRDD (and neither the optional bucketing specification of the HadoopFsRelation is defined nor bucketing is enabled).

`selectedPartitions` Internal Lazy-Initialized Property



selectedPartitions: Seq[PartitionDirectory]

selectedPartitions: Seq[PartitionDirectory]

selectedPartitions…FIXME

Note	`selectedPartitions` is used when `FileSourceScanExec` is requested for the following: outputPartitioning and outputOrdering when bucketing is enabled and the optional bucketing specification of the HadoopFsRelation is defined metadata inputRDD

Creating FileSourceScanExec Instance

FileSourceScanExec takes the following when created:

HadoopFsRelation
Output schema attributes
Schema
partitionFilters expressions
Bucket IDs for bucket pruning (Option[BitSet])
dataFilters expressions
Optional TableIdentifier

FileSourceScanExec initializes the internal registries and counters.

Output Partitioning Scheme — `outputPartitioning` Attribute



outputPartitioning: Partitioning

outputPartitioning: Partitioning

Note	`outputPartitioning` is part of the SparkPlan Contract to specify output data partitioning.

outputPartitioning can be one of the following:

HashPartitioning (with the bucket column names and the number of buckets of the bucketing specification of the HadoopFsRelation) when bucketing is enabled and the HadoopFsRelation has a bucketing specification defined
UnknownPartitioning (with 0 partitions) otherwise

Creating FileScanRDD with Bucketing Support — `createBucketedReadRDD` Internal Method



createBucketedReadRDD(
  bucketSpec: BucketSpec,
  readFile: (PartitionedFile) => Iterator[InternalRow],
  selectedPartitions: Seq[PartitionDirectory],
  fsRelation: HadoopFsRelation): RDD[InternalRow]

createBucketedReadRDD(

bucketSpec: BucketSpec,

readFile: (PartitionedFile) => Iterator[InternalRow],

selectedPartitions: Seq[PartitionDirectory],

fsRelation: HadoopFsRelation): RDD[InternalRow]

createBucketedReadRDD prints the following INFO message to the logs:



Planning with [numBuckets] buckets

Planning with [numBuckets] buckets

createBucketedReadRDD maps the available files of the input selectedPartitions into PartitionedFiles. For every file, createBucketedReadRDD getBlockLocations and getBlockHosts.

createBucketedReadRDD then groups the PartitionedFiles by bucket ID.

Note	Bucket ID is of the format _0000n, i.e. the bucket ID prefixed with up to four `0`s.

createBucketedReadRDD prunes (filters out) the bucket files for the bucket IDs that are not listed in the bucket IDs for bucket pruning.

createBucketedReadRDD creates a FilePartition for every bucket ID and the (pruned) bucket PartitionedFiles.

In the end, createBucketedReadRDD creates a FileScanRDD (with the input readFile for the read function and the FilePartitions for every bucket ID for partitions)

Tip

Use RDD.toDebugString to see FileScanRDD in the RDD execution plan (aka RDD lineage).



// Create a bucketed table
spark.range(8).write.bucketBy(4, "id").saveAsTable("b1")

scala> sql("desc extended b1").where($"col_name" like "%Bucket%").show
+--------------+---------+-------+
|      col_name|data_type|comment|
+--------------+---------+-------+
|   Num Buckets|        4|       |
|Bucket Columns|   [`id`]|       |
+--------------+---------+-------+

val bucketedTable = spark.table("b1")

val lineage = bucketedTable.queryExecution.toRdd.toDebugString
scala> println(lineage)
(4) MapPartitionsRDD[26] at toRdd at <console>:26 []
 |  FileScanRDD[25] at toRdd at <console>:26 []

// Create a bucketed table

spark.range(8).write.bucketBy(4, "id").saveAsTable("b1")

scala> sql("desc extended b1").where($"col_name" like "%Bucket%").show

+--------------+---------+-------+

| col_name|data_type|comment|

+--------------+---------+-------+

| Num Buckets| 4| |

|Bucket Columns| [`id`]| |

+--------------+---------+-------+

val bucketedTable = spark.table("b1")

val lineage = bucketedTable.queryExecution.toRdd.toDebugString

scala> println(lineage)

(4) MapPartitionsRDD[26] at toRdd at <console>:26 []

| FileScanRDD[25] at toRdd at <console>:26 []

Note	`createBucketedReadRDD` is used exclusively when `FileSourceScanExec` physical operator is requested for the inputRDD (and the optional bucketing specification of the HadoopFsRelation is defined and bucketing is enabled).

`supportsBatch` Attribute



supportsBatch: Boolean

supportsBatch: Boolean

Note	`supportsBatch` is part of the ColumnarBatchScan Contract to enable vectorized decoding.

supportsBatch is enabled (i.e. true) only when the FileFormat (of the HadoopFsRelation) supports vectorized decoding.

Otherwise, supportsBatch is disabled (i.e. false).

FileSourceScanExec As ColumnarBatchScan

FileSourceScanExec is a ColumnarBatchScan and supports batch decoding only when the FileFormat (of the HadoopFsRelation) supports it.

FileSourceScanExec has needsUnsafeRowConversion flag enabled for ParquetFileFormat data sources exclusively.

FileSourceScanExec has vectorTypes…FIXME

`needsUnsafeRowConversion` Flag



needsUnsafeRowConversion: Boolean

needsUnsafeRowConversion: Boolean

Note	`needsUnsafeRowConversion` is part of ColumnarBatchScan Contract to control the name of the variable for an input row while generating the Java source code to consume generated columns or row from a physical operator.

needsUnsafeRowConversion is enabled (i.e. true) when the following conditions all hold:

FileFormat of the HadoopFsRelation is ParquetFileFormat
spark.sql.parquet.enableVectorizedReader configuration property is enabled (default: true)

Otherwise, needsUnsafeRowConversion is disabled (i.e. false).

Note	`needsUnsafeRowConversion` is used when `FileSourceScanExec` is executed (and supportsBatch flag is off).

Requesting Concrete ColumnVector Class Names — `vectorTypes` Method



vectorTypes: Option[Seq[String]]

vectorTypes: Option[Seq[String]]

Note	`vectorTypes` is part of ColumnarBatchScan Contract to..FIXME.

vectorTypes simply requests the FileFormat of the HadoopFsRelation for vectorTypes.

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of the SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute branches off per supportsBatch flag.

If supportsBatch is on, doExecute creates a WholeStageCodegenExec (with codegenStageId as 0) and executes it right after.

If supportsBatch is off, doExecute creates an unsafeRows RDD to scan over which is different per needsUnsafeRowConversion flag.

If needsUnsafeRowConversion flag is on, doExecute takes the inputRDD and creates a new RDD by applying a function to each partition (using RDD.mapPartitionsWithIndexInternal):

Creates a UnsafeProjection for the schema
Initializes the UnsafeProjection
Maps over the rows in a partition iterator using the UnsafeProjection projection

Otherwise, doExecute simply takes the inputRDD as the unsafeRows RDD (with no changes).

doExecute takes the numOutputRows metric and creates a new RDD by mapping every element in the unsafeRows and incrementing the numOutputRows metric.

Tip	Use `RDD.toDebugString` to review the RDD lineage and “reverse-engineer” the values of the supportsBatch and needsUnsafeRowConversion flags given the number of RDDs. With supportsBatch off and needsUnsafeRowConversion on you should see two more RDDs in the RDD lineage.

Creating Input RDD of Internal Rows — `inputRDD` Internal Property



inputRDD: RDD[InternalRow]

inputRDD: RDD[InternalRow]

Note	`inputRDD` is a Scala lazy value which is computed once when accessed and cached afterwards.

inputRDD is an input RDD of internal binary rows (i.e. InternalRow) that is used when FileSourceScanExec physical operator is requested for inputRDDs and execution.

When created, inputRDD requests HadoopFsRelation to get the underlying FileFormat that is in turn requested to build a data reader with partition column values appended (with the input parameters from the properties of HadoopFsRelation and pushedDownFilters).

In case HadoopFsRelation has bucketing specification defined and bucketing support is enabled, inputRDD creates a FileScanRDD with bucketing (with the bucketing specification, the reader, selectedPartitions and the HadoopFsRelation itself). Otherwise, inputRDD createNonBucketedReadRDD.

Note	createBucketedReadRDD accepts a bucketing specification while createNonBucketedReadRDD does not.

Output Data Ordering — `outputOrdering` Attribute



outputOrdering: Seq[SortOrder]

outputOrdering: Seq[SortOrder]

Note	`outputOrdering` is part of the SparkPlan Contract to specify output data ordering.

outputOrdering is a SortOrder expression for every sort column in Ascending order only when all the following hold:

bucketing is enabled
HadoopFsRelation has a bucketing specification defined
All the buckets have a single file in it

Otherwise, outputOrdering is simply empty (Nil).

上一页
1
···
17
18
19
20
21
22
23
...
下一页
共 58 页

spark-sql 第20页

ProjectExec Unary Physical Operator

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Inside doExecute (RDD.mapPartitionsWithIndexInternal)

Creating ProjectExec Instance

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

ObjectHashAggregateExec Aggregate Physical Operator

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

supportsAggregate Method

Creating ObjectHashAggregateExec Instance

MapElementsExec

LocalTableScanExec Physical Operator

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Creating LocalTableScanExec Instance

InMemoryTableScanExec Leaf Physical Operator

Creating InMemoryTableScanExec Instance

vectorTypes Method

supportsBatch Property

partitionFilters Property

Applying Partition Batch Pruning to Cached Column Buffers (Creating MapPartitionsRDD of Filtered CachedBatches) — filteredCachedBatches Internal Method

statsFor Internal Method

createAndDecompressColumn Internal Method

Creating Input RDD of Internal Rows — inputRDD Internal Property

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

buildFilter Property

innerChildren Method

HiveTableScanExec Leaf Physical Operator

Creating HiveTableScanExec Instance

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

HashAggregateExec Aggregate Physical Operator for Hash-Based Aggregation

finishAggregate Method

Generating Java Source Code for Whole-Stage Consume Path with Grouping Keys — doConsumeWithKeys Internal Method

Generating Java Source Code for Whole-Stage Consume Path without Grouping Keys — doConsumeWithoutKeys Internal Method

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

Generating Java Source Code For “produce” Path (In Whole-Stage Code Generation) — doProduceWithKeys Internal Method

doProduceWithoutKeys Internal Method

generateResultFunction Internal Method

supportsAggregate Object Method

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce Method

Creating HashAggregateExec Instance

Creating UnsafeFixedWidthAggregationMap Instance — createHashMap Method

GenerateExec Unary Physical Operator

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce Method

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

codeGenCollection Internal Method

codeGenTraversableOnce Internal Method

codeGenAccessor Internal Method

Creating GenerateExec Instance

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

FilterExec Unary Physical Operator

Creating FilterExec Instance

isNullIntolerant Internal Method

usedInputs Method

output Method

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce Method

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

genPredicate Internal Method

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

FileSourceScanExec Leaf Physical Operator

Creating RDD for Non-Bucketed Reads — createNonBucketedReadRDD Internal Method

selectedPartitions Internal Lazy-Initialized Property

Creating FileSourceScanExec Instance

Output Partitioning Scheme — outputPartitioning Attribute

Creating FileScanRDD with Bucketing Support — createBucketedReadRDD Internal Method

supportsBatch Attribute

FileSourceScanExec As ColumnarBatchScan

needsUnsafeRowConversion Flag

Requesting Concrete ColumnVector Class Names — vectorTypes Method

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Creating Input RDD of Internal Rows — inputRDD Internal Property

Output Data Ordering — outputOrdering Attribute

欢迎关注：spark技术分享

关注公众号：spark技术分享

QQ咨询

回顶部

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

Inside `doExecute` (`RDD.mapPartitionsWithIndexInternal`)

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — `doConsume` Method

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

`supportsAggregate` Method

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

`vectorTypes` Method

`supportsBatch` Property

`partitionFilters` Property

Applying Partition Batch Pruning to Cached Column Buffers (Creating MapPartitionsRDD of Filtered CachedBatches) — `filteredCachedBatches` Internal Method

`statsFor` Internal Method

`createAndDecompressColumn` Internal Method

Creating Input RDD of Internal Rows — `inputRDD` Internal Property

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

`buildFilter` Property

`innerChildren` Method

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

`finishAggregate` Method

Generating Java Source Code for Whole-Stage Consume Path with Grouping Keys — `doConsumeWithKeys` Internal Method

Generating Java Source Code for Whole-Stage Consume Path without Grouping Keys — `doConsumeWithoutKeys` Internal Method

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — `doConsume` Method

Generating Java Source Code For “produce” Path (In Whole-Stage Code Generation) — `doProduceWithKeys` Internal Method

`doProduceWithoutKeys` Internal Method

`generateResultFunction` Internal Method

`supportsAggregate` Object Method

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — `doProduce` Method

Creating UnsafeFixedWidthAggregationMap Instance — `createHashMap` Method

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — `doProduce` Method

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — `doConsume` Method

`codeGenCollection` Internal Method

`codeGenTraversableOnce` Internal Method

`codeGenAccessor` Internal Method

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

`isNullIntolerant` Internal Method

`usedInputs` Method

`output` Method

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — `doProduce` Method

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — `doConsume` Method

`genPredicate` Internal Method

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

Creating RDD for Non-Bucketed Reads — `createNonBucketedReadRDD` Internal Method

`selectedPartitions` Internal Lazy-Initialized Property

Output Partitioning Scheme — `outputPartitioning` Attribute

Creating FileScanRDD with Bucketing Support — `createBucketedReadRDD` Internal Method

`supportsBatch` Attribute

`needsUnsafeRowConversion` Flag

Requesting Concrete ColumnVector Class Names — `vectorTypes` Method

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

Creating Input RDD of Internal Rows — `inputRDD` Internal Property

Output Data Ordering — `outputOrdering` Attribute