spark-sql-spark技术分享-第18页

DetermineTableStats Logical PostHoc Resolution Rule — Computing Total Size Table Statistic for HiveTableRelations

DetermineTableStats is a logical posthoc resolution rule that the Hive-specific logical query plan analyzer uses to compute total size table statistic for HiveTableRelations with no statistics.

Technically, DetermineTableStats is a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

`apply` Method



apply(plan: LogicalPlan): LogicalPlan

1

2

3

4

5

apply(plan: LogicalPlan): LogicalPlan

Note	`apply` is part of Rule Contract to apply a rule to a logical plan (aka execute a rule).

apply…FIXME

DataSourceAnalysis PostHoc Logical Resolution Rule

DataSourceAnalysis is a posthoc logical resolution rule that the default and Hive-specific logical query plan analyzers use to FIXME.

Table 1. DataSourceAnalysis’s Logical Resolutions (Conversions)
Source Operator	Target Operator	Description
CreateTable (isDatasourceTable + no query)	CreateDataSourceTableCommand
CreateTable (isDatasourceTable + a resolved query)	CreateDataSourceTableAsSelectCommand
InsertIntoTable with InsertableRelation	InsertIntoDataSourceCommand
InsertIntoDir (non-hive provider)	InsertIntoDataSourceDirCommand
InsertIntoTable with HadoopFsRelation	InsertIntoHadoopFsRelationCommand

Technically, DataSourceAnalysis is a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].



// FIXME Example of DataSourceAnalysis
import org.apache.spark.sql.execution.datasources.DataSourceAnalysis
val rule = DataSourceAnalysis(spark.sessionState.conf)

val plan = FIXME

rule(plan)

1

2

3

4

5

6

7

8

9

10

11

// FIXME Example of DataSourceAnalysis

import org.apache.spark.sql.execution.datasources.DataSourceAnalysis

val rule = DataSourceAnalysis(spark.sessionState.conf)

val plan = FIXME

rule(plan)

Executing Rule — `apply` Method



apply(plan: LogicalPlan): LogicalPlan

1

2

3

4

5

apply(plan: LogicalPlan): LogicalPlan

Note	`apply` is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…FIXME

CleanupAliases Logical Analysis Rule

CleanupAliases is a logical analysis rule that transforms a logical query plan with…FIXME

CleanupAliases is part of the Cleanup fixed-point batch in the standard batches of the Analyzer.

CleanupAliases is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].



// FIXME: DEMO

1

2

3

4

5

// FIXME: DEMO

Executing Rule — `apply` Method



apply(plan: LogicalPlan): LogicalPlan

1

2

3

4

5

apply(plan: LogicalPlan): LogicalPlan

Note	`apply` is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…FIXME

AliasViewChild Logical Analysis Rule

AliasViewChild is a logical analysis rule that transforms a logical query plan with View unary logical operators and adds Project logical operator (possibly with Alias expressions) when the outputs of a view and the underlying table do not match (and therefore require aliasing and projection).

AliasViewChild is part of the View once-executed batch in the standard batches of the Analyzer.

AliasViewChild is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

AliasViewChild takes a SQLConf when created.



// Sample view for the demo
// The order of the column names do not match
// In View: name, id
// In VALUES: id, name
sql("""
  CREATE OR REPLACE VIEW v (name COMMENT 'First name only', id COMMENT 'Identifier') COMMENT 'Permanent view'
  AS VALUES (1, 'Jacek'), (2, 'Agata') AS t1(id, name)
  """)

scala> :type spark
org.apache.spark.sql.SparkSession

val q = spark.table("v")

val plan = q.queryExecution.logical
scala> println(plan.numberedTreeString)
00 'UnresolvedRelation `v`

// Resolve UnresolvedRelation first
// since AliasViewChild work with View operators only
import spark.sessionState.analyzer.ResolveRelations
val resolvedPlan = ResolveRelations(plan)
scala> println(resolvedPlan.numberedTreeString)
00 SubqueryAlias v
01 +- View (`default`.`v`, [name#32,id#33])
02    +- SubqueryAlias t1
03       +- LocalRelation [id#34, name#35]

scala> :type spark.sessionState.conf
org.apache.spark.sql.internal.SQLConf

import org.apache.spark.sql.catalyst.analysis.AliasViewChild
val rule = AliasViewChild(spark.sessionState.conf)

// Notice that resolvedPlan is used (not plan)
val planAfterAliasViewChild = rule(resolvedPlan)
scala> println(planAfterAliasViewChild.numberedTreeString)
00 SubqueryAlias v
01 +- View (`default`.`v`, [name#32,id#33])
02    +- Project [cast(id#34 as int) AS name#32, cast(name#35 as string) AS id#33]
03       +- SubqueryAlias t1
04          +- LocalRelation [id#34, name#35]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

// Sample view for the demo

// The order of the column names do not match

// In View: name, id

// In VALUES: id, name

sql("""

CREATE OR REPLACE VIEW v (name COMMENT 'First name only', id COMMENT 'Identifier') COMMENT 'Permanent view'

AS VALUES (1, 'Jacek'), (2, 'Agata') AS t1(id, name)

""")

scala> :type spark

org.apache.spark.sql.SparkSession

val q = spark.table("v")

val plan = q.queryExecution.logical

scala> println(plan.numberedTreeString)

00 'UnresolvedRelation `v`

// Resolve UnresolvedRelation first

// since AliasViewChild work with View operators only

import spark.sessionState.analyzer.ResolveRelations

val resolvedPlan = ResolveRelations(plan)

scala> println(resolvedPlan.numberedTreeString)

00 SubqueryAlias v

01 +- View (`default`.`v`, [name#32,id#33])

02 +- SubqueryAlias t1

03 +- LocalRelation [id#34, name#35]

scala> :type spark.sessionState.conf

org.apache.spark.sql.internal.SQLConf

import org.apache.spark.sql.catalyst.analysis.AliasViewChild

val rule = AliasViewChild(spark.sessionState.conf)

// Notice that resolvedPlan is used (not plan)

val planAfterAliasViewChild = rule(resolvedPlan)

scala> println(planAfterAliasViewChild.numberedTreeString)

00 SubqueryAlias v

01 +- View (`default`.`v`, [name#32,id#33])

02 +- Project [cast(id#34 as int) AS name#32, cast(name#35 as string) AS id#33]

03 +- SubqueryAlias t1

04 +- LocalRelation [id#34, name#35]

Executing Rule — `apply` Method



apply(plan: LogicalPlan): LogicalPlan

1

2

3

4

5

apply(plan: LogicalPlan): LogicalPlan

Note	`apply` is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply…FIXME

WholeStageCodegenExec

2013-02-08admin阅读(1482)

WholeStageCodegenExec Unary Physical Operator for Java Code Generation

WholeStageCodegenExec is a unary physical operator that is one of the two physical operators that lay the foundation for the Whole-Stage Java Code Generation for a Codegened Execution Pipeline of a structured query.

Note	InputAdapter is the other physical operator for Codegened Execution Pipeline of a structured query.

WholeStageCodegenExec itself supports the Java code generation and so when executed triggers code generation for the entire child physical plan subtree of a structured query.



val q = spark.range(10).where('id === 4)
scala> q.queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*(1) Filter (id#3L = 4)
+- *(1) Range (0, 10, step=1, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
...

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

val q = spark.range(10).where('id === 4)

scala> q.queryExecution.debug.codegen

Found 1 WholeStageCodegen subtrees.

== Subtree 1 / 1 ==

*(1) Filter (id#3L = 4)

+- *(1) Range (0, 10, step=1, splits=8)

Generated code:

/* 001 */ public Object generate(Object[] references) {

/* 002 */ return new GeneratedIteratorForCodegenStage1(references);

/* 003 */ }

/* 004 */

/* 005 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {

...

Tip	Consider using Debugging Query Execution facility to deep dive into the whole-stage code generation.



val q = spark.range(10).where('id === 4)
import org.apache.spark.sql.execution.debug._
scala> q.debugCodegen()
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*(1) Filter (id#0L = 4)
+- *(1) Range (0, 10, step=1, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
...

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

val q = spark.range(10).where('id === 4)

import org.apache.spark.sql.execution.debug._

scala> q.debugCodegen()

Found 1 WholeStageCodegen subtrees.

== Subtree 1 / 1 ==

*(1) Filter (id#0L = 4)

+- *(1) Range (0, 10, step=1, splits=8)

Generated code:

/* 001 */ public Object generate(Object[] references) {

/* 002 */ return new GeneratedIteratorForCodegenStage1(references);

/* 003 */ }

/* 004 */

/* 005 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {

...

Tip

Use the following to enable comments in generated code.



org.apache.spark.SparkEnv.get.conf.set("spark.sql.codegen.comments", "true")

1

2

3

4

5

org.apache.spark.SparkEnv.get.conf.set("spark.sql.codegen.comments", "true")



val q = spark.range(10).where('id === 4)
import org.apache.spark.sql.execution.debug._
scala> q.debugCodegen()
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*(1) Filter (id#6L = 4)
+- *(1) Range (0, 10, step=1, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ /**
 * Codegend pipeline for stage (id=1)
 * *(1) Filter (id#6L = 4)
 * +- *(1) Range (0, 10, step=1, splits=8)
 */
/* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
...

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

val q = spark.range(10).where('id === 4)

import org.apache.spark.sql.execution.debug._

scala> q.debugCodegen()

Found 1 WholeStageCodegen subtrees.

== Subtree 1 / 1 ==

*(1) Filter (id#6L = 4)

+- *(1) Range (0, 10, step=1, splits=8)

Generated code:

/* 001 */ public Object generate(Object[] references) {

/* 002 */ return new GeneratedIteratorForCodegenStage1(references);

/* 003 */ }

/* 004 */

/* 005 */ /**

* Codegend pipeline for stage (id=1)

* *(1) Filter (id#6L = 4)

* +- *(1) Range (0, 10, step=1, splits=8)

*/

/* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {

...

WholeStageCodegenExec is created when:

CollapseCodegenStages physical query optimization is executed (with spark.sql.codegen.wholeStage configuration property enabled)
FileSourceScanExec leaf physical operator is executed (with the supportsBatch flag enabled)
InMemoryTableScanExec leaf physical operator is executed (with the supportsBatch flag enabled)
DataSourceV2ScanExec leaf physical operator is executed (with the supportsBatch flag enabled)

Note	spark.sql.codegen.wholeStage property is enabled by default.

WholeStageCodegenExec takes a single child physical operator (a physical subquery tree) and codegen stage ID when created.

Note	`WholeStageCodegenExec` requires that the single child physical operator supports Java code generation.



// RangeExec physical operator does support codegen
import org.apache.spark.sql.execution.RangeExec
import org.apache.spark.sql.catalyst.plans.logical.Range
val rangeExec = RangeExec(Range(start = 0, end = 1, step = 1, numSlices = 1))

import org.apache.spark.sql.execution.WholeStageCodegenExec
val rdd = WholeStageCodegenExec(rangeExec)(codegenStageId = 0).execute()

1

2

3

4

5

6

7

8

9

10

11

// RangeExec physical operator does support codegen

import org.apache.spark.sql.execution.RangeExec

import org.apache.spark.sql.catalyst.plans.logical.Range

val rangeExec = RangeExec(Range(start = 0, end = 1, step = 1, numSlices = 1))

import org.apache.spark.sql.execution.WholeStageCodegenExec

val rdd = WholeStageCodegenExec(rangeExec)(codegenStageId = 0).execute()

WholeStageCodegenExec marks the child physical operator with * (star) prefix and per-query codegen stage ID (in round brackets) in the text representation of a physical plan tree.



scala> println(plan.numberedTreeString)
00 *(1) Project [id#117L]
01 +- *(1) BroadcastHashJoin [id#117L], [cast(id#115 as bigint)], Inner, BuildRight
02    :- *(1) Range (0, 1, step=1, splits=8)
03    +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
04       +- Generate explode(ids#112), false, [id#115]
05          +- LocalTableScan [ids#112]

1

2

3

4

5

6

7

8

9

10

11

scala> println(plan.numberedTreeString)

00 *(1) Project [id#117L]

01 +- *(1) BroadcastHashJoin [id#117L], [cast(id#115 as bigint)], Inner, BuildRight

02 :- *(1) Range (0, 1, step=1, splits=8)

03 +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))

04 +- Generate explode(ids#112), false, [id#115]

05 +- LocalTableScan [ids#112]

Note	As `WholeStageCodegenExec` is created as a result of CollapseCodegenStages physical query optimization rule, it is only executed in executedPlan phase of a query execution (that you can only notice by the `*` star prefix in a plan output).



val q = spark.range(9)

// we need executedPlan with WholeStageCodegenExec physical operator "injected"
val plan = q.queryExecution.executedPlan

// Note the star prefix of Range that marks WholeStageCodegenExec
// As a matter of fact, there are two physical operators in play here
// i.e. WholeStageCodegenExec with Range as the child
scala> println(plan.numberedTreeString)
00 *Range (0, 9, step=1, splits=8)

// Let's unwrap Range physical operator
// and access the parent WholeStageCodegenExec
import org.apache.spark.sql.execution.WholeStageCodegenExec
val wsce = plan.asInstanceOf[WholeStageCodegenExec]

// Trigger code generation of the entire query plan tree
val (ctx, code) = wsce.doCodeGen

// CodeFormatter can pretty-print the code
import org.apache.spark.sql.catalyst.expressions.codegen.CodeFormatter
scala> println(CodeFormatter.format(code))
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ /**
 * Codegend pipeline for
 * Range (0, 9, step=1, splits=8)
 */
/* 006 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
...

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

val q = spark.range(9)

// we need executedPlan with WholeStageCodegenExec physical operator "injected"

val plan = q.queryExecution.executedPlan

// Note the star prefix of Range that marks WholeStageCodegenExec

// As a matter of fact, there are two physical operators in play here

// i.e. WholeStageCodegenExec with Range as the child

scala> println(plan.numberedTreeString)

00 *Range (0, 9, step=1, splits=8)

// Let's unwrap Range physical operator

// and access the parent WholeStageCodegenExec

import org.apache.spark.sql.execution.WholeStageCodegenExec

val wsce = plan.asInstanceOf[WholeStageCodegenExec]

// Trigger code generation of the entire query plan tree

val (ctx, code) = wsce.doCodeGen

// CodeFormatter can pretty-print the code

import org.apache.spark.sql.catalyst.expressions.codegen.CodeFormatter

scala> println(CodeFormatter.format(code))

/* 001 */ public Object generate(Object[] references) {

/* 002 */ return new GeneratedIterator(references);

/* 003 */ }

/* 004 */

/* 005 */ /**

* Codegend pipeline for

* Range (0, 9, step=1, splits=8)

*/

/* 006 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {

...

When executed, WholeStageCodegenExec gives pipelineTime performance metric.

Table 1. WholeStageCodegenExec’s Performance Metrics
Key	Name (in web UI)	Description
`pipelineTime`	(empty)	Time of how long the whole-stage codegend pipeline has been running (i.e. the elapsed time since the underlying BufferedRowIterator had been created and the internal rows were all consumed).

spark sql WholeStageCodegenExec webui.png

Figure 1. WholeStageCodegenExec in web UI (Details for Query)

Tip	Use explain operator to know the physical plan of a query and find out whether or not `WholeStageCodegen` is in use.



val q = spark.range(10).where('id === 4)
// Note the stars in the output that are for codegened operators
scala> q.explain
== Physical Plan ==
*Filter (id#0L = 4)
+- *Range (0, 10, step=1, splits=8)

1

2

3

4

5

6

7

8

9

10

val q = spark.range(10).where('id === 4)

// Note the stars in the output that are for codegened operators

scala> q.explain

== Physical Plan ==

*Filter (id#0L = 4)

+- *Range (0, 10, step=1, splits=8)

Note	Physical plans that support code generation extend CodegenSupport.

Tip

Enable DEBUG logging level for org.apache.spark.sql.execution.WholeStageCodegenExec logger to see what happens inside.

Add the following line to conf/log4j.properties:



log4j.logger.org.apache.spark.sql.execution.WholeStageCodegenExec=DEBUG

1

2

3

4

5

log4j.logger.org.apache.spark.sql.execution.WholeStageCodegenExec=DEBUG

Refer to Logging.

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

1

2

3

4

5

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute generates the Java source code for the child physical plan subtree first and uses CodeGenerator to compile it right afterwards.

If compilation goes well, doExecute branches off per the number of input RDDs.

Note	`doExecute` only supports up to two input RDDs.

Caution

FIXME Finish the “success” path

If the size of the generated codes is greater than spark.sql.codegen.hugeMethodLimit (which defaults to 65535), doExecute prints out the following INFO message:



Found too long generated codes and JIT optimization might not work: the bytecode size ([maxCodeSize]) is above the limit [spark.sql.codegen.hugeMethodLimit], and the whole-stage codegen was disabled for this plan (id=[codegenStageId]). To avoid this, you can raise the limit `spark.sql.codegen.hugeMethodLimit`:
[treeString]

1

2

3

4

5

6

Found too long generated codes and JIT optimization might not work: the bytecode size ([maxCodeSize]) is above the limit [spark.sql.codegen.hugeMethodLimit], and the whole-stage codegen was disabled for this plan (id=[codegenStageId]). To avoid this, you can raise the limit `spark.sql.codegen.hugeMethodLimit`:

[treeString]

In the end, doExecute requests the child physical operator to execute (that triggers physical query planning and generates an RDD[InternalRow]) and returns it.

Note	`doExecute` skips requesting the child physical operator to execute for FileSourceScanExec leaf physical operator with supportsBatch flag enabled (as `FileSourceScanExec` operator uses `WholeStageCodegenExec` operator when FileSourceScanExec).

If compilation fails and spark.sql.codegen.fallback configuration property is enabled, doExecute prints out the following WARN message to the logs, requests the child physical operator to execute and returns it.



Whole-stage codegen disabled for plan (id=[codegenStageId]):
 [treeString]

1

2

3

4

5

6

Whole-stage codegen disabled for plan (id=[codegenStageId]):

[treeString]

Generating Java Source Code for Child Physical Plan Subtree — `doCodeGen` Method



doCodeGen(): (CodegenContext, CodeAndComment)

1

2

3

4

5

doCodeGen(): (CodegenContext, CodeAndComment)

doCodeGen creates a new CodegenContext and requests the single child physical operator to generate a Java source code for produce code path (with the new CodegenContext and the WholeStageCodegenExec physical operator itself).

doCodeGen adds the new function under the name of processNext.

doCodeGen generates the class name.

doCodeGen generates the final Java source code of the following format:



public Object generate(Object[] references) {
  return new [className](references);
}

/**
 * Codegend pipeline for stage (id=[codegenStageId])
 * [treeString]
 */
final class [className] extends BufferedRowIterator {

  private Object[] references;
  private scala.collection.Iterator[] inputs;
  // ctx.declareMutableStates()

  public [className](Object[] references) {
    this.references = references;
  }

  public void init(int index, scala.collection.Iterator[] inputs) {
    partitionIndex = index;
    this.inputs = inputs;
    // ctx.initMutableStates()
    // ctx.initPartition()
  }

  // ctx.emitExtraCode()

  // ctx.declareAddedFunctions()
}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

public Object generate(Object[] references) {

return new [className](references);

}

/**

* Codegend pipeline for stage (id=[codegenStageId])

* [treeString]

*/

final class [className] extends BufferedRowIterator {

private Object[] references;

private scala.collection.Iterator[] inputs;

// ctx.declareMutableStates()

public [className](Object[] references) {

this.references = references;

}

public void init(int index, scala.collection.Iterator[] inputs) {

partitionIndex = index;

this.inputs = inputs;

// ctx.initMutableStates()

// ctx.initPartition()

}

// ctx.emitExtraCode()

// ctx.declareAddedFunctions()

}

Note	`doCodeGen` requires that the single child physical operator supports Java code generation.

doCodeGen cleans up the generated code (using CodeFormatter to stripExtraNewLines, stripOverlappingComments).

doCodeGen prints out the following DEBUG message to the logs:



DEBUG WholeStageCodegenExec:
[cleanedSource]

1

2

3

4

5

6

DEBUG WholeStageCodegenExec:

[cleanedSource]

In the end, doCodeGen returns the CodegenContext and the Java source code (as a CodeAndComment).

Note	`doCodeGen` is used when: `WholeStageCodegenExec` is executed Debugging Query Execution is requested to display a Java source code generated for a structured query in Whole-Stage Code Generation

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — `doConsume` Method



doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

1

2

3

4

5

doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

Note	`doConsume` is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.

doConsume generates a Java source code that:

Takes (from the input row) the code to evaluate a Catalyst expression on an input InternalRow
Takes (from the input row) the term for a value of the result of the evaluation
1. Adds .copy() to the term if needCopyResult is turned on
Wraps the term inside append() code block



import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
val ctx = new CodegenContext()

import org.apache.spark.sql.catalyst.expressions.codegen.ExprCode
val exprCode = ExprCode(code = "my_code", isNull = "false", value = "my_value")

// wsce defined above, i.e at the top of the page
val consumeCode = wsce.doConsume(ctx, input = Seq(), row = exprCode)
scala> println(consumeCode)
my_code
append(my_value);

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext

val ctx = new CodegenContext()

import org.apache.spark.sql.catalyst.expressions.codegen.ExprCode

val exprCode = ExprCode(code = "my_code", isNull = "false", value = "my_value")

// wsce defined above, i.e at the top of the page

val consumeCode = wsce.doConsume(ctx, input = Seq(), row = exprCode)

scala> println(consumeCode)

my_code

append(my_value);

Generating Class Name — `generatedClassName` Method



generatedClassName(): String

1

2

3

4

5

generatedClassName(): String

generatedClassName gives a class name per spark.sql.codegen.useIdInClassName configuration property:

GeneratedIteratorForCodegenStage with the codegen stage ID when enabled (true)
GeneratedIterator when disabled (false)

Note	`generatedClassName` is used exclusively when `WholeStageCodegenExec` unary physical operator is requested to generate the Java source code for the child physical plan subtree.

`isTooManyFields` Object Method



isTooManyFields(conf: SQLConf, dataType: DataType): Boolean

1

2

3

4

5

isTooManyFields(conf: SQLConf, dataType: DataType): Boolean

isTooManyFields…FIXME

Note	`isTooManyFields` is used when…FIXME

WindowFunctionFrame

2013-02-07admin阅读(1610)

WindowFunctionFrame

WindowFunctionFrame is a contract for…FIXME

Table 1. WindowFunctionFrame’s Implementations
Name	Description
`OffsetWindowFunctionFrame`
`SlidingWindowFunctionFrame`
`UnboundedFollowingWindowFunctionFrame`
`UnboundedPrecedingWindowFunctionFrame`
UnboundedWindowFunctionFrame

`UnboundedWindowFunctionFrame`

UnboundedWindowFunctionFrame is a WindowFunctionFrame that gives the same value for every row in a partition.

UnboundedWindowFunctionFrame is created for AggregateFunctions (in AggregateExpressions) or AggregateWindowFunctions with no frame defined (i.e. no rowsBetween or rangeBetween) that boils down to using the entire partition frame.

UnboundedWindowFunctionFrame takes the following when created:

Target InternalRow
AggregateProcessor

`prepare` Method



prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit

1

2

3

4

5

prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit

prepare requests AggregateProcessor to initialize passing in the number of UnsafeRows in the input ExternalAppendOnlyUnsafeRowArray.

prepare then requests ExternalAppendOnlyUnsafeRowArray to generate an interator.

In the end, prepare requests AggregateProcessor to update passing in every UnsafeRow in the iterator one at a time.

`write` Method



write(index: Int, current: InternalRow): Unit

1

2

3

4

5

write(index: Int, current: InternalRow): Unit

write simply requests AggregateProcessor to evaluate the target InternalRow.

WindowFunctionFrame Contract



package org.apache.spark.sql.execution.window

abstract class WindowFunctionFrame {
  def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit
  def write(index: Int, current: InternalRow): Unit
}

1

2

3

4

5

6

7

8

9

10

package org.apache.spark.sql.execution.window

abstract class WindowFunctionFrame {

def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit

def write(index: Int, current: InternalRow): Unit

}

Note	`WindowFunctionFrame` is a `private[window]` contract.

Table 2. WindowFunctionFrame Contract
Method	Description
`prepare`	Used exclusively when `WindowExec` operator fetches all UnsafeRows for a partition (passing in ExternalAppendOnlyUnsafeRowArray with all `UnsafeRows`).
`write`	Used exclusively when the Iterator[InternalRow] (from executing `WindowExec`) is requested a next row.

AggregateProcessor

2013-02-06admin阅读(1643)

AggregateProcessor

AggregateProcessor is created and used exclusively when WindowExec physical operator is executed.

AggregateProcessor supports DeclarativeAggregate and ImperativeAggregate aggregate functions only (which happen to be AggregateFunction in AggregateExpression or AggregateWindowFunction).

Table 1. AggregateProcessor’s Properties
Name	Description
`buffer`	`SpecificInternalRow` with data types given bufferSchema

Note	`AggregateProcessor` is created using `AggregateProcessor` factory object (using apply method).

`initialize` Method



initialize(size: Int): Unit

1

2

3

4

5

initialize(size: Int): Unit

Caution

FIXME

Note	`initialize` is used when: `SlidingWindowFunctionFrame` writes out to the target row `UnboundedWindowFunctionFrame` is prepared `UnboundedPrecedingWindowFunctionFrame` is prepared `UnboundedFollowingWindowFunctionFrame` writes out to the target row

`evaluate` Method



evaluate(target: InternalRow): Unit

1

2

3

4

5

evaluate(target: InternalRow): Unit

Caution

FIXME

Note	`evaluate` is used when…FIXME

`apply` Factory Method



apply(
  functions: Array[Expression],
  ordinal: Int,
  inputAttributes: Seq[Attribute],
  newMutableProjection: (Seq[Expression], Seq[Attribute]) => MutableProjection): AggregateProcessor

1

2

3

4

5

6

7

8

9

apply(

functions: Array[Expression],

ordinal: Int,

inputAttributes: Seq[Attribute],

newMutableProjection: (Seq[Expression], Seq[Attribute]) => MutableProjection): AggregateProcessor

Note	`apply` is used exclusively when `WindowExec` is executed (and creates WindowFunctionFrame per `AGGREGATE` window aggregate functions, i.e. AggregateExpression or AggregateWindowFunction)

Executing update on ImperativeAggregates — `update` Method



update(input: InternalRow): Unit

1

2

3

4

5

update(input: InternalRow): Unit

update executes the update method on every input ImperativeAggregate sequentially (one by one).

Internally, update joins buffer with input internal binary row and converts the joined InternalRow using the MutableProjection function.

update then requests every ImperativeAggregate to update passing in the buffer and the input input rows.

Note	`MutableProjection` mutates the same underlying binary row object each time it is executed.

Note	`update` is used when `WindowFunctionFrame` prepares or writes.

Creating AggregateProcessor Instance

AggregateProcessor takes the following when created:

Schema of the buffer (as a collection of AttributeReferences)
Initial MutableProjection
Update MutableProjection
Evaluate MutableProjection
ImperativeAggregate expressions for aggregate functions
Flag whether to track partition size

WindowExec

2013-02-05admin阅读(3856)

WindowExec Unary Physical Operator

WindowExec is a unary physical operator (i.e. with one child physical operator) for window aggregation execution that represents Window unary logical operator at execution.

WindowExec is created exclusively when BasicOperators execution planning strategy resolves a Window unary logical operator.



// arguably the most trivial example
// just a dataset of 3 rows per group
// to demo how partitions and frames work
// note the rows per groups are not consecutive (in the middle)
val metrics = Seq(
  (0, 0, 0), (1, 0, 1), (2, 5, 2), (3, 0, 3), (4, 0, 1), (5, 5, 3), (6, 5, 0)
).toDF("id", "device", "level")
scala> metrics.show
+---+------+-----+
| id|device|level|
+---+------+-----+
|  0|     0|    0|
|  1|     0|    1|
|  2|     5|    2|  // <-- this row for device 5 is among the rows of device 0
|  3|     0|    3|  // <-- as above but for device 0
|  4|     0|    1|  // <-- almost as above but there is a group of two rows for device 0
|  5|     5|    3|
|  6|     5|    0|
+---+------+-----+

// create windows of rows to use window aggregate function over every window
import org.apache.spark.sql.expressions.Window
val rangeWithTwoDevicesById = Window.
  partitionBy('device).
  orderBy('id).
  rangeBetween(start = -1, end = Window.currentRow) // <-- use rangeBetween first
val sumOverRange = metrics.withColumn("sum", sum('level) over rangeWithTwoDevicesById)

// Logical plan with Window unary logical operator
val optimizedPlan = sumOverRange.queryExecution.optimizedPlan
scala> println(optimizedPlan)
Window [sum(cast(level#9 as bigint)) windowspecdefinition(device#8, id#7 ASC NULLS FIRST, RANGE BETWEEN 1 PRECEDING AND CURRENT ROW) AS sum#15L], [device#8], [id#7 ASC NULLS FIRST]
+- LocalRelation [id#7, device#8, level#9]

// Physical plan with WindowExec unary physical operator (shown as Window)
scala> sumOverRange.explain
== Physical Plan ==
Window [sum(cast(level#9 as bigint)) windowspecdefinition(device#8, id#7 ASC NULLS FIRST, RANGE BETWEEN 1 PRECEDING AND CURRENT ROW) AS sum#15L], [device#8], [id#7 ASC NULLS FIRST]
+- *Sort [device#8 ASC NULLS FIRST, id#7 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(device#8, 200)
      +- LocalTableScan [id#7, device#8, level#9]

// Going fairly low-level...you've been warned

val plan = sumOverRange.queryExecution.executedPlan
import org.apache.spark.sql.execution.window.WindowExec
val we = plan.asInstanceOf[WindowExec]

val windowRDD = we.execute()
scala> :type windowRDD
org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow]

scala> windowRDD.toDebugString
res0: String =
(200) MapPartitionsRDD[5] at execute at <console>:35 []
  |   MapPartitionsRDD[4] at execute at <console>:35 []
  |   ShuffledRowRDD[3] at execute at <console>:35 []
  +-(7) MapPartitionsRDD[2] at execute at <console>:35 []
     |  MapPartitionsRDD[1] at execute at <console>:35 []
     |  ParallelCollectionRDD[0] at execute at <console>:35 []

// no computation on the source dataset has really occurred
// Let's trigger a RDD action
scala> windowRDD.first
res0: org.apache.spark.sql.catalyst.InternalRow = [0,2,5,2,2]

scala> windowRDD.foreach(println)
[0,2,5,2,2]
[0,0,0,0,0]
[0,5,5,3,3]
[0,6,5,0,3]
[0,1,0,1,1]
[0,3,0,3,3]
[0,4,0,1,4]

scala> sumOverRange.show
+---+------+-----+---+
| id|device|level|sum|
+---+------+-----+---+
|  2|     5|    2|  2|
|  5|     5|    3|  3|
|  6|     5|    0|  3|
|  0|     0|    0|  0|
|  1|     0|    1|  1|
|  3|     0|    3|  3|
|  4|     0|    1|  4|
+---+------+-----+---+

// use rowsBetween
val rowsWithTwoDevicesById = Window.
  partitionBy('device).
  orderBy('id).
  rowsBetween(start = -1, end = Window.currentRow)
val sumOverRows = metrics.withColumn("sum", sum('level) over rowsWithTwoDevicesById)

// let's see the result first to have them close
// and compare row- vs range-based windows
scala> sumOverRows.show
+---+------+-----+---+
| id|device|level|sum|
+---+------+-----+---+
|  2|     5|    2|  2|
|  5|     5|    3|  5| <-- a difference
|  6|     5|    0|  3|
|  0|     0|    0|  0|
|  1|     0|    1|  1|
|  3|     0|    3|  4| <-- another difference
|  4|     0|    1|  4|
+---+------+-----+---+

val rowsOptimizedPlan = sumOverRows.queryExecution.optimizedPlan
scala> println(rowsOptimizedPlan)
Window [sum(cast(level#901 as bigint)) windowspecdefinition(device#900, id#899 ASC NULLS FIRST, ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS sum#1458L], [device#900], [id#899 ASC NULLS FIRST]
+- LocalRelation [id#899, device#900, level#901]

scala> sumOverRows.explain
== Physical Plan ==
Window [sum(cast(level#901 as bigint)) windowspecdefinition(device#900, id#899 ASC NULLS FIRST, ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS sum#1458L], [device#900], [id#899 ASC NULLS FIRST]
+- *Sort [device#900 ASC NULLS FIRST, id#899 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(device#900, 200)
      +- LocalTableScan [id#899, device#900, level#901]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

// arguably the most trivial example

// just a dataset of 3 rows per group

// to demo how partitions and frames work

// note the rows per groups are not consecutive (in the middle)

val metrics = Seq(

(0, 0, 0), (1, 0, 1), (2, 5, 2), (3, 0, 3), (4, 0, 1), (5, 5, 3), (6, 5, 0)

).toDF("id", "device", "level")

scala> metrics.show

+---+------+-----+

| id|device|level|

+---+------+-----+

| 0| 0| 0|

| 1| 0| 1|

| 2| 5| 2| // <-- this row for device 5 is among the rows of device 0

| 3| 0| 3| // <-- as above but for device 0

| 4| 0| 1| // <-- almost as above but there is a group of two rows for device 0

| 5| 5| 3|

| 6| 5| 0|

+---+------+-----+

// create windows of rows to use window aggregate function over every window

import org.apache.spark.sql.expressions.Window

val rangeWithTwoDevicesById = Window.

partitionBy('device).

orderBy('id).

rangeBetween(start = -1, end = Window.currentRow) // <-- use rangeBetween first

val sumOverRange = metrics.withColumn("sum", sum('level) over rangeWithTwoDevicesById)

// Logical plan with Window unary logical operator

val optimizedPlan = sumOverRange.queryExecution.optimizedPlan

scala> println(optimizedPlan)

Window [sum(cast(level#9 as bigint)) windowspecdefinition(device#8, id#7 ASC NULLS FIRST, RANGE BETWEEN 1 PRECEDING AND CURRENT ROW) AS sum#15L], [device#8], [id#7 ASC NULLS FIRST]

+- LocalRelation [id#7, device#8, level#9]

// Physical plan with WindowExec unary physical operator (shown as Window)

scala> sumOverRange.explain

== Physical Plan ==

Window [sum(cast(level#9 as bigint)) windowspecdefinition(device#8, id#7 ASC NULLS FIRST, RANGE BETWEEN 1 PRECEDING AND CURRENT ROW) AS sum#15L], [device#8], [id#7 ASC NULLS FIRST]

+- *Sort [device#8 ASC NULLS FIRST, id#7 ASC NULLS FIRST], false, 0

+- Exchange hashpartitioning(device#8, 200)

+- LocalTableScan [id#7, device#8, level#9]

// Going fairly low-level...you've been warned

val plan = sumOverRange.queryExecution.executedPlan

import org.apache.spark.sql.execution.window.WindowExec

val we = plan.asInstanceOf[WindowExec]

val windowRDD = we.execute()

scala> :type windowRDD

org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow]

scala> windowRDD.toDebugString

res0: String =

(200) MapPartitionsRDD[5] at execute at <console>:35 []

| MapPartitionsRDD[4] at execute at <console>:35 []

| ShuffledRowRDD[3] at execute at <console>:35 []

+-(7) MapPartitionsRDD[2] at execute at <console>:35 []

| MapPartitionsRDD[1] at execute at <console>:35 []

| ParallelCollectionRDD[0] at execute at <console>:35 []

// no computation on the source dataset has really occurred

// Let's trigger a RDD action

scala> windowRDD.first

res0: org.apache.spark.sql.catalyst.InternalRow = [0,2,5,2,2]

scala> windowRDD.foreach(println)

[0,2,5,2,2]

[0,0,0,0,0]

[0,5,5,3,3]

[0,6,5,0,3]

[0,1,0,1,1]

[0,3,0,3,3]

[0,4,0,1,4]

scala> sumOverRange.show

+---+------+-----+---+

| id|device|level|sum|

+---+------+-----+---+

| 2| 5| 2| 2|

| 5| 5| 3| 3|

| 6| 5| 0| 3|

| 0| 0| 0| 0|

| 1| 0| 1| 1|

| 3| 0| 3| 3|

| 4| 0| 1| 4|

+---+------+-----+---+

// use rowsBetween

val rowsWithTwoDevicesById = Window.

partitionBy('device).

orderBy('id).

rowsBetween(start = -1, end = Window.currentRow)

val sumOverRows = metrics.withColumn("sum", sum('level) over rowsWithTwoDevicesById)

// let's see the result first to have them close

// and compare row- vs range-based windows

scala> sumOverRows.show

+---+------+-----+---+

| id|device|level|sum|

+---+------+-----+---+

| 2| 5| 2| 2|

| 5| 5| 3| 5| <-- a difference

| 6| 5| 0| 3|

| 0| 0| 0| 0|

| 1| 0| 1| 1|

| 3| 0| 3| 4| <-- another difference

| 4| 0| 1| 4|

+---+------+-----+---+

val rowsOptimizedPlan = sumOverRows.queryExecution.optimizedPlan

scala> println(rowsOptimizedPlan)

Window [sum(cast(level#901 as bigint)) windowspecdefinition(device#900, id#899 ASC NULLS FIRST, ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS sum#1458L], [device#900], [id#899 ASC NULLS FIRST]

+- LocalRelation [id#899, device#900, level#901]

scala> sumOverRows.explain

== Physical Plan ==

Window [sum(cast(level#901 as bigint)) windowspecdefinition(device#900, id#899 ASC NULLS FIRST, ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS sum#1458L], [device#900], [id#899 ASC NULLS FIRST]

+- *Sort [device#900 ASC NULLS FIRST, id#899 ASC NULLS FIRST], false, 0

+- Exchange hashpartitioning(device#900, 200)

+- LocalTableScan [id#899, device#900, level#901]



// a more involved example
val dataset = spark.range(start = 0, end = 13, step = 1, numPartitions = 4)

import org.apache.spark.sql.expressions.Window
val groupsOrderById = Window.partitionBy('group).rangeBetween(-2, Window.currentRow).orderBy('id)
val query = dataset.
  withColumn("group", 'id % 4).
  select('*, sum('id) over groupsOrderById as "sum")

scala> query.explain
== Physical Plan ==
Window [sum(id#25L) windowspecdefinition(group#244L, id#25L ASC NULLS FIRST, RANGE BETWEEN 2 PRECEDING AND CURRENT ROW) AS sum#249L], [group#244L], [id#25L ASC NULLS FIRST]
+- *Sort [group#244L ASC NULLS FIRST, id#25L ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(group#244L, 200)
      +- *Project [id#25L, (id#25L % 4) AS group#244L]
         +- *Range (0, 13, step=1, splits=4)

val plan = query.queryExecution.executedPlan
import org.apache.spark.sql.execution.window.WindowExec
val we = plan.asInstanceOf[WindowExec]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

// a more involved example

val dataset = spark.range(start = 0, end = 13, step = 1, numPartitions = 4)

import org.apache.spark.sql.expressions.Window

val groupsOrderById = Window.partitionBy('group).rangeBetween(-2, Window.currentRow).orderBy('id)

val query = dataset.

withColumn("group", 'id % 4).

select('*, sum('id) over groupsOrderById as "sum")

scala> query.explain

== Physical Plan ==

Window [sum(id#25L) windowspecdefinition(group#244L, id#25L ASC NULLS FIRST, RANGE BETWEEN 2 PRECEDING AND CURRENT ROW) AS sum#249L], [group#244L], [id#25L ASC NULLS FIRST]

+- *Sort [group#244L ASC NULLS FIRST, id#25L ASC NULLS FIRST], false, 0

+- Exchange hashpartitioning(group#244L, 200)

+- *Project [id#25L, (id#25L % 4) AS group#244L]

+- *Range (0, 13, step=1, splits=4)

val plan = query.queryExecution.executedPlan

import org.apache.spark.sql.execution.window.WindowExec

val we = plan.asInstanceOf[WindowExec]

spark sql WindowExec webui query details.png

Figure 1. WindowExec in web UI (Details for Query)

The output schema of WindowExec are the attributes of child physical operator and window expressions.



val schema = query.queryExecution.executedPlan.output.toStructType
scala> println(schema.treeString)
root
 |-- id: long (nullable = false)
 |-- group: long (nullable = true)
 |-- sum: long (nullable = true)

// we is WindowExec created earlier
// child's output
scala> println(we.child.output.toStructType.treeString)
root
 |-- id: long (nullable = false)
 |-- group: long (nullable = true)

// window expressions' output
scala> println(we.windowExpression.map(_.toAttribute).toStructType.treeString)
root
 |-- sum: long (nullable = true)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

val schema = query.queryExecution.executedPlan.output.toStructType

scala> println(schema.treeString)

root

|-- id: long (nullable = false)

|-- group: long (nullable = true)

|-- sum: long (nullable = true)

// we is WindowExec created earlier

// child's output

scala> println(we.child.output.toStructType.treeString)

root

|-- id: long (nullable = false)

|-- group: long (nullable = true)

// window expressions' output

scala> println(we.windowExpression.map(_.toAttribute).toStructType.treeString)

root

|-- sum: long (nullable = true)

Table 1. WindowExec’s Required Child Output Distribution
Single Child
ClusteredDistribution (per window partition specifications expressions)

If no window partition specification is specified, WindowExec prints out the following WARN message to the logs (and the child’s distribution requirement is AllTuples):



WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.

1

2

3

4

5

WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.

Tip

Enable WARN logging level for org.apache.spark.sql.execution.WindowExec logger to see what happens inside.

Add the following line to conf/log4j.properties:



log4j.logger.org.apache.spark.sql.execution.WindowExec=WARN

1

2

3

4

5

log4j.logger.org.apache.spark.sql.execution.WindowExec=WARN

Refer to Logging.

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method



doExecute(): RDD[InternalRow]

1

2

3

4

5

doExecute(): RDD[InternalRow]

Note	`doExecute` is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. `RDD[InternalRow]`).

doExecute executes the single child physical operator and maps over partitions using a custom Iterator[InternalRow].

Note	When executed, `doExecute` creates a `MapPartitionsRDD` with the `child` physical operator’s `RDD[InternalRow]`.



scala> :type we
org.apache.spark.sql.execution.window.WindowExec

val windowRDD = we.execute
scala> :type windowRDD
org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow]

scala> println(windowRDD.toDebugString)
(200) MapPartitionsRDD[5] at execute at <console>:35 []
  |   MapPartitionsRDD[4] at execute at <console>:35 []
  |   ShuffledRowRDD[3] at execute at <console>:35 []
  +-(7) MapPartitionsRDD[2] at execute at <console>:35 []
     |  MapPartitionsRDD[1] at execute at <console>:35 []
     |  ParallelCollectionRDD[0] at execute at <console>:35 []

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

scala> :type we

org.apache.spark.sql.execution.window.WindowExec

val windowRDD = we.execute

scala> :type windowRDD

org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow]

scala> println(windowRDD.toDebugString)

(200) MapPartitionsRDD[5] at execute at <console>:35 []

| MapPartitionsRDD[4] at execute at <console>:35 []

| ShuffledRowRDD[3] at execute at <console>:35 []

+-(7) MapPartitionsRDD[2] at execute at <console>:35 []

| MapPartitionsRDD[1] at execute at <console>:35 []

| ParallelCollectionRDD[0] at execute at <console>:35 []

Internally, doExecute first takes WindowExpressions and their WindowFunctionFrame factory functions (from window frame factories) followed by executing the single child physical operator and mapping over partitions (using RDD.mapPartitions operator).

doExecute creates an Iterator[InternalRow] (of UnsafeRow exactly).

Mapping Over UnsafeRows per Partition — `Iterator[InternalRow]`

When created, Iterator[InternalRow] first creates two UnsafeProjection conversion functions (to convert InternalRows to UnsafeRows) as result and grouping.

Note	grouping conversion function is created for window partition specifications expressions and used exclusively to create nextGroup when `Iterator[InternalRow]` is requested next row.

Tip

Enable DEBUG logging level for org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator logger to see the code generated for grouping conversion function.

Add the following line to conf/log4j.properties:



log4j.logger.org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator=DEBUG

1

2

3

4

5

log4j.logger.org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator=DEBUG

Refer to Logging.

Iterator[InternalRow] then fetches the first row from the upstream RDD and initializes nextRow and nextGroup UnsafeRows.

Note	`nextGroup` is the result of converting `nextRow` using grouping conversion function.

doExecute creates a ExternalAppendOnlyUnsafeRowArray buffer using spark.sql.windowExec.buffer.spill.threshold property (default: 4096) as the threshold for the number of rows buffered.

doExecute creates a SpecificInternalRow for the window function result (as windowFunctionResult).

Note	`SpecificInternalRow` is also used in the generated code for the `UnsafeProjection` for the result.

doExecute takes the window frame factories and generates WindowFunctionFrame per factory (using the SpecificInternalRow created earlier).

Caution

FIXME

Note	ExternalAppendOnlyUnsafeRowArray is used to collect `UnsafeRow` objects from the child’s partitions (one partition per buffer and up to `spark.sql.windowExec.buffer.spill.threshold`).

`next` Method



override final def next(): InternalRow

1

2

3

4

5

override final def next(): InternalRow

Note	`next` is part of Scala’s scala.collection.Iterator interface that returns the next element and discards it from the iterator.

next method of the final Iterator is…FIXME

next first fetches a new partition, but only when…FIXME

Note	`next` loads all the rows in `nextGroup`.

Caution

FIXME What’s nextGroup?

next takes one UnsafeRow from bufferIterator.

Caution

FIXME bufferIterator seems important for the iteration.

next then requests every WindowFunctionFrame to write the current rowIndex and UnsafeRow.

Caution

FIXME rowIndex?

next joins the current UnsafeRow and windowFunctionResult (i.e. takes two InternalRows and makes them appear as a single concatenated InternalRow).

next increments rowIndex.

In the end, next uses the UnsafeProjection function (that was created using createResultProjection) and projects the joined InternalRow to the result UnsafeRow.

Fetching All Rows In Partition — `fetchNextPartition` Internal Method



fetchNextPartition(): Unit

1

2

3

4

5

fetchNextPartition(): Unit

fetchNextPartition first copies the current nextGroup UnsafeRow (that was created using grouping projection function) and clears the internal buffer.

fetchNextPartition then collects all UnsafeRows for the current nextGroup in buffer.

With the buffer filled in (with UnsafeRows per partition), fetchNextPartition prepares every WindowFunctionFrame function in frames one by one (and passing buffer).

In the end, fetchNextPartition resets rowIndex to 0 and requests buffer to generate an iterator (available as bufferIterator).

Note	`fetchNextPartition` is used internally when doExecute‘s `Iterator` is requested for the next UnsafeRow (when `bufferIterator` is uninitialized or was drained, i.e. holds no elements, but there are still rows in the upstream operator’s partition).

`fetchNextRow` Internal Method



fetchNextRow(): Unit

1

2

3

4

5

fetchNextRow(): Unit

fetchNextRow checks whether there is the next row available (using the upstream Iterator.hasNext) and sets nextRowAvailable mutable internal flag.

If there is a row available, fetchNextRow sets nextRow internal variable to the next UnsafeRow from the upstream’s RDD.

fetchNextRow also sets nextGroup internal variable as an UnsafeRow for nextRow using grouping function.

Note	`grouping` is a UnsafeProjection function that is created for window partition specifications expressions to be bound to the single child‘s output schema. `grouping` uses GenerateUnsafeProjection to canonicalize the bound expressions and create the `UnsafeProjection` function.

If no row is available, fetchNextRow nullifies nextRow and nextGroup internal variables.

Note	`fetchNextRow` is used internally when doExecute‘s `Iterator` is created and fetchNextPartition is called.

`createResultProjection` Internal Method



createResultProjection(expressions: Seq[Expression]): UnsafeProjection

1

2

3

4

5

createResultProjection(expressions: Seq[Expression]): UnsafeProjection

createResultProjection creates a UnsafeProjection function for expressions window function Catalyst expressions so that the window expressions are on the right side of child’s output.

Note	UnsafeProjection is a Scala function that produces UnsafeRow for an InternalRow.

Internally, createResultProjection first creates a translation table with a BoundReference per expression (in the input expressions).

Note	`BoundReference` is a Catalyst expression that is a reference to a value in internal binary row at a specified position and of specified data type.

createResultProjection then creates a window function bound references for window expressions so unbound expressions are transformed to the BoundReferences.

In the end, createResultProjection creates a UnsafeProjection with:

exprs expressions from child‘s output and the collection of window function bound references
inputSchema input schema per child‘s output

Note	`createResultProjection` is used exclusively when `WindowExec` is executed.

Creating WindowExec Instance

WindowExec takes the following when created:

Window named expressions
Window partition specification expressions
Window order specification (as a collection of SortOrder expressions)
Child physical operator

Lookup Table for WindowExpressions and Factory Functions for WindowFunctionFrame — `windowFrameExpressionFactoryPairs` Lazy Value



windowFrameExpressionFactoryPairs:
  Seq[(mutable.Buffer[WindowExpression], InternalRow => WindowFunctionFrame)]

1

2

3

4

5

6

windowFrameExpressionFactoryPairs:

Seq[(mutable.Buffer[WindowExpression], InternalRow => WindowFunctionFrame)]

windowFrameExpressionFactoryPairs is a lookup table with window expressions and factory functions for WindowFunctionFrame (per key-value pair in framedFunctions lookup table).

A factory function is a function that takes an InternalRow and produces a WindowFunctionFrame (described in the table below)

Internally, windowFrameExpressionFactoryPairs first builds framedFunctions lookup table with 4-element tuple keys and 2-element expression list values (described in the table below).

windowFrameExpressionFactoryPairs finds WindowExpression expressions in the input windowExpression and for every WindowExpression takes the window frame specification (of type SpecifiedWindowFrame that is used to find frame type and start and end frame positions).

Table 2. framedFunctions’s FrameKey — 4-element Tuple for Frame Keys (in positional order)
Element	Description
Name of the kind of function	AGGREGATE for AggregateFunction (in AggregateExpressions) or AggregateWindowFunction OFFSET for `OffsetWindowFunction`
`FrameType`	`RangeFrame` or `RowFrame`
Window frame’s start position	Positive number for `CurrentRow` (0) and `ValueFollowing` Negative number for `ValuePreceding` Empty when unspecified
Window frame’s end position	Positive number for `CurrentRow` (0) and `ValueFollowing` Negative number for `ValuePreceding` Empty when unspecified

Table 3. framedFunctions’s 2-element Tuple Values (in positional order)
Element	Description
Collection of window expressions	WindowExpression
Collection of window functions	AggregateFunction (in AggregateExpressions) or `AggregateWindowFunction` `OffsetWindowFunction`

windowFrameExpressionFactoryPairs creates a AggregateProcessor for AGGREGATE frame keys in framedFunctions lookup table.

Table 4. windowFrameExpressionFactoryPairs’ Factory Functions (in creation order)
Frame Name	FrameKey	WindowFunctionFrame
Offset Frame	`("OFFSET", RowFrame, Some(offset), Some(h))`	`OffsetWindowFunctionFrame`
Growing Frame	`("AGGREGATE", frameType, None, Some(high))`	`UnboundedPrecedingWindowFunctionFrame`
Shrinking Frame	`("AGGREGATE", frameType, Some(low), None)`	`UnboundedFollowingWindowFunctionFrame`
Moving Frame	`("AGGREGATE", frameType, Some(low), Some(high))`	`SlidingWindowFunctionFrame`
Entire Partition Frame	`("AGGREGATE", frameType, None, None)`	UnboundedWindowFunctionFrame

Note	`lazy val` in Scala is computed when first accessed and once only (for the entire lifetime of the owning object instance).

Note	`windowFrameExpressionFactoryPairs` is used exclusively when `WindowExec` is executed.

`createBoundOrdering` Internal Method



createBoundOrdering(frame: FrameType, bound: Expression, timeZone: String): BoundOrdering

1

2

3

4

5

createBoundOrdering(frame: FrameType, bound: Expression, timeZone: String): BoundOrdering

createBoundOrdering…FIXME

Note	`createBoundOrdering` is used exclusively when `WindowExec` physical operator is requested for the window frame factories.

SubqueryExec

2013-02-04admin阅读(3480)

SubqueryExec Unary Physical Operator

SubqueryExec is a unary physical operator (i.e. with one child physical operator) that…FIXME

SubqueryExec uses relationFuture that is lazily and executed only once when SubqueryExec is first requested to prepare execution that simply triggers execution of the child operator asynchronously (i.e. on a separate thread) and to collect the result soon after (that makes SubqueryExec waiting indefinitely for the child operator to be finished).

Caution

FIXME When is doPrepare executed?

SubqueryExec is created exclusively when PlanSubqueries preparation rule is executed (and transforms ScalarSubquery expressions in a physical plan).



val q = sql("select (select max(id) from t1) tt from t1")
scala> q.explain
== Physical Plan ==
*Project [Subquery subquery32 AS tt#33L]
:  +- Subquery subquery32
:     +- *HashAggregate(keys=[], functions=[max(id#20L)])
:        +- Exchange SinglePartition
:           +- *HashAggregate(keys=[], functions=[partial_max(id#20L)])
:              +- *FileScan parquet default.t1[id#20L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/jacek/dev/oss/spark/spark-warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>
+- *FileScan parquet default.t1[] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/jacek/dev/oss/spark/spark-warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

val q = sql("select (select max(id) from t1) tt from t1")

scala> q.explain

== Physical Plan ==

*Project [Subquery subquery32 AS tt#33L]

: +- Subquery subquery32

: +- *HashAggregate(keys=[], functions=[max(id#20L)])

: +- Exchange SinglePartition

: +- *HashAggregate(keys=[], functions=[partial_max(id#20L)])

: +- *FileScan parquet default.t1[id#20L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/jacek/dev/oss/spark/spark-warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>

+- *FileScan parquet default.t1[] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/jacek/dev/oss/spark/spark-warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>

Table 1. SubqueryExec’s Performance Metrics
Key	Name (in web UI)	Description
`collectTime`	time to collect (ms)
`dataSize`	data size (bytes)

spark sql SubqueryExec webui details for query.png

Figure 1. SubqueryExec in web UI (Details for Query)

Note	`SubqueryExec` physical operator is almost an exact copy of BroadcastExchangeExec physical operator.

Executing Child Operator Asynchronously — `doPrepare` Method



doPrepare(): Unit

1

2

3

4

5

doPrepare(): Unit

Note	`doPrepare` is part of SparkPlan Contract to prepare a physical operator for execution.

doPrepare simply triggers initialization of the internal lazily-once-initialized relationFuture asynchronous computation.

`relationFuture` Internal Lazily-Once-Initialized Property



relationFuture: Future[Array[InternalRow]]

1

2

3

4

5

relationFuture: Future[Array[InternalRow]]

When “materialized” (aka executed), relationFuture spawns a new thread of execution that requests SQLExecution to execute an action (with the current execution id) on subquery daemon cached thread pool.

Note	`relationFuture` uses Scala’s scala.concurrent.Future that spawns a new thread of execution once instantiated.

The action tracks execution of the child physical operator to executeCollect and collects collectTime and dataSize SQL metrics.

In the end, relationFuture posts metric updates and returns the internal rows.

Note	`relationFuture` is executed on a separate thread from a custom scala.concurrent.ExecutionContext (built from a cached java.util.concurrent.ThreadPoolExecutor with the prefix subquery and up to 16 threads).

Note	`relationFuture` is used when `SubqueryExec` is requested to prepare for execution (that triggers execution of the child operator) and execute collect (that waits indefinitely until the child operator has finished).

Creating SubqueryExec Instance

SubqueryExec takes the following when created:

Name of the subquery
Child physical plan

Collecting Internal Rows of Executing SubqueryExec Operator — `executeCollect` Method



executeCollect(): Array[InternalRow]

1

2

3

4

5

executeCollect(): Array[InternalRow]

Note	`executeCollect` is part of SparkPlan Contract to execute a physical operator and collect the results as collection of internal rows.

executeCollect waits till relationFuture gives a result (as a Array[InternalRow]).

SortExec

2013-02-03admin阅读(1961)

SortExec Unary Physical Operator

SortExec is a unary physical operator that is created when:

BasicOperators execution planning strategy is requested to plan a Sort logical operator
FileFormatWriter helper object is requested to write the result of a structured query
EnsureRequirements physical query optimization is executed (and enforces partition requirements for data distribution and ordering of a physical operator)

SortExec supports Java code generation (aka codegen).



val q = Seq((0, "zero"), (1, "one")).toDF("id", "name").sort('id)
val qe = q.queryExecution

val logicalPlan = qe.analyzed
scala> println(logicalPlan.numberedTreeString)
00 Sort [id#72 ASC NULLS FIRST], true
01 +- Project [_1#69 AS id#72, _2#70 AS name#73]
02    +- LocalRelation [_1#69, _2#70]

// BasicOperators does the conversion of Sort logical operator to SortExec
val sparkPlan = qe.sparkPlan
scala> println(sparkPlan.numberedTreeString)
00 Sort [id#72 ASC NULLS FIRST], true, 0
01 +- LocalTableScan [id#72, name#73]

// SortExec supports Whole-Stage Code Generation
val executedPlan = qe.executedPlan
scala> println(executedPlan.numberedTreeString)
00 *(1) Sort [id#72 ASC NULLS FIRST], true, 0
01 +- Exchange rangepartitioning(id#72 ASC NULLS FIRST, 200)
02    +- LocalTableScan [id#72, name#73]

import org.apache.spark.sql.execution.SortExec
val sortExec = executedPlan.collect { case se: SortExec => se }.head
assert(sortExec.isInstanceOf[SortExec])

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

val q = Seq((0, "zero"), (1, "one")).toDF("id", "name").sort('id)

val qe = q.queryExecution

val logicalPlan = qe.analyzed

scala> println(logicalPlan.numberedTreeString)

00 Sort [id#72 ASC NULLS FIRST], true

01 +- Project [_1#69 AS id#72, _2#70 AS name#73]

02 +- LocalRelation [_1#69, _2#70]

// BasicOperators does the conversion of Sort logical operator to SortExec

val sparkPlan = qe.sparkPlan

scala> println(sparkPlan.numberedTreeString)

00 Sort [id#72 ASC NULLS FIRST], true, 0

01 +- LocalTableScan [id#72, name#73]

// SortExec supports Whole-Stage Code Generation

val executedPlan = qe.executedPlan

scala> println(executedPlan.numberedTreeString)

00 *(1) Sort [id#72 ASC NULLS FIRST], true, 0

01 +- Exchange rangepartitioning(id#72 ASC NULLS FIRST, 200)

02 +- LocalTableScan [id#72, name#73]

import org.apache.spark.sql.execution.SortExec

val sortExec = executedPlan.collect { case se: SortExec => se }.head

assert(sortExec.isInstanceOf[SortExec])

When requested for the output attributes, SortExec simply gives whatever the child operator uses.

SortExec uses the sorting order expressions for the output data ordering requirements.

When requested for the output data partitioning requirements, SortExec simply gives whatever the child operator uses.

When requested for the required partition requirements, SortExec gives the OrderedDistribution (with the sorting order expressions for the ordering) when the global flag is enabled (true) or the UnspecifiedDistribution.

SortExec operator uses the spark.sql.sort.enableRadixSort internal configuration property (enabled by default) to control…FIXME

Table 1. SortExec’s Performance Metrics
Key	Name (in web UI)	Description
`peakMemory`	peak memory
`sortTime`	sort time
`spillSize`	spill size

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — `doProduce` Method



doProduce(ctx: CodegenContext): String

1

2

3

4

5

doProduce(ctx: CodegenContext): String

Note	`doProduce` is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.

doProduce…FIXME

Creating SortExec Instance

SortExec takes the following when created:

Sorting order expressions (Seq[SortOrder])
global flag
Child physical plan
testSpillFrequency (default: 0)

`createSorter` Method



createSorter(): UnsafeExternalRowSorter

1

2

3

4

5

createSorter(): UnsafeExternalRowSorter

createSorter…FIXME

Note	`createSorter` is used when…FIXME

spark-sql 第18页

DetermineTableStats Logical PostHoc Resolution Rule — Computing Total Size Table Statistic for HiveTableRelations

apply Method

DataSourceAnalysis PostHoc Logical Resolution Rule

Executing Rule — apply Method

CleanupAliases Logical Analysis Rule

Executing Rule — apply Method

AliasViewChild Logical Analysis Rule

Executing Rule — apply Method

WholeStageCodegenExec Unary Physical Operator for Java Code Generation

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Generating Java Source Code for Child Physical Plan Subtree — doCodeGen Method

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

Generating Class Name — generatedClassName Method

isTooManyFields Object Method

WindowFunctionFrame

UnboundedWindowFunctionFrame

prepare Method

write Method

WindowFunctionFrame Contract

AggregateProcessor

initialize Method

evaluate Method

apply Factory Method

Executing update on ImperativeAggregates — update Method

Creating AggregateProcessor Instance

WindowExec Unary Physical Operator

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Mapping Over UnsafeRows per Partition — Iterator[InternalRow]

next Method

Fetching All Rows In Partition — fetchNextPartition Internal Method

fetchNextRow Internal Method

createResultProjection Internal Method

Creating WindowExec Instance

Lookup Table for WindowExpressions and Factory Functions for WindowFunctionFrame — windowFrameExpressionFactoryPairs Lazy Value

createBoundOrdering Internal Method

SubqueryExec Unary Physical Operator

Executing Child Operator Asynchronously — doPrepare Method

relationFuture Internal Lazily-Once-Initialized Property

Creating SubqueryExec Instance

Collecting Internal Rows of Executing SubqueryExec Operator — executeCollect Method

SortExec Unary Physical Operator

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce Method

Creating SortExec Instance

createSorter Method

欢迎关注：spark技术分享

关注公众号：spark技术分享

QQ咨询

回顶部

`apply` Method

Executing Rule — `apply` Method

Executing Rule — `apply` Method

Executing Rule — `apply` Method

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

Generating Java Source Code for Child Physical Plan Subtree — `doCodeGen` Method

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — `doConsume` Method

Generating Class Name — `generatedClassName` Method

`isTooManyFields` Object Method

`UnboundedWindowFunctionFrame`

`prepare` Method

`write` Method

`initialize` Method

`evaluate` Method

`apply` Factory Method

Executing update on ImperativeAggregates — `update` Method

Executing Physical Operator (Generating RDD[InternalRow]) — `doExecute` Method

Mapping Over UnsafeRows per Partition — `Iterator[InternalRow]`

`next` Method

Fetching All Rows In Partition — `fetchNextPartition` Internal Method

`fetchNextRow` Internal Method

`createResultProjection` Internal Method

Lookup Table for WindowExpressions and Factory Functions for WindowFunctionFrame — `windowFrameExpressionFactoryPairs` Lazy Value

`createBoundOrdering` Internal Method

Executing Child Operator Asynchronously — `doPrepare` Method

`relationFuture` Internal Lazily-Once-Initialized Property

Collecting Internal Rows of Executing SubqueryExec Operator — `executeCollect` Method

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — `doProduce` Method

`createSorter` Method