spark-sql-spark技术分享-第27页

HiveTableRelation Leaf Logical Operator — Representing Hive Tables in Logical Plan

HiveTableRelation is a leaf logical operator that represents a Hive table in a logical query plan.

HiveTableRelation is created exclusively when FindDataSourceTable logical evaluation rule is requested to resolve UnresolvedCatalogRelations in a logical plan (for Hive tables).



val tableName = "h1"

// Make the example reproducible
val db = spark.catalog.currentDatabase
import spark.sharedState.{externalCatalog => extCatalog}
extCatalog.dropTable(
  db, table = tableName, ignoreIfNotExists = true, purge = true)

// sql("CREATE TABLE h1 (id LONG) USING hive")
import org.apache.spark.sql.types.StructType
spark.catalog.createTable(
  tableName,
  source = "hive",
  schema = new StructType().add($"id".long),
  options = Map.empty[String, String])

val h1meta = extCatalog.getTable(db, tableName)
scala> println(h1meta.provider.get)
hive

// Looks like we've got the testing space ready for the experiment
val h1 = spark.table(tableName)

import org.apache.spark.sql.catalyst.dsl.plans._
val plan = table(tableName).insertInto("t2", overwrite = true)
scala> println(plan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'UnresolvedRelation `h1`

// ResolveRelations logical rule first to resolve UnresolvedRelations
import spark.sessionState.analyzer.ResolveRelations
val rrPlan = ResolveRelations(plan)
scala> println(rrPlan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'SubqueryAlias h1
02    +- 'UnresolvedCatalogRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

// FindDataSourceTable logical rule next to resolve UnresolvedCatalogRelations
import org.apache.spark.sql.execution.datasources.FindDataSourceTable
val findTablesRule = new FindDataSourceTable(spark)
val planWithTables = findTablesRule(rrPlan)

// At long last...
// Note HiveTableRelation in the logical plan
scala> println(planWithTables.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- SubqueryAlias h1
02    +- HiveTableRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#13L]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

val tableName = "h1"

// Make the example reproducible

val db = spark.catalog.currentDatabase

import spark.sharedState.{externalCatalog => extCatalog}

extCatalog.dropTable(

db, table = tableName, ignoreIfNotExists = true, purge = true)

// sql("CREATE TABLE h1 (id LONG) USING hive")

import org.apache.spark.sql.types.StructType

spark.catalog.createTable(

tableName,

source = "hive",

schema = new StructType().add($"id".long),

options = Map.empty[String, String])

val h1meta = extCatalog.getTable(db, tableName)

scala> println(h1meta.provider.get)

hive

// Looks like we've got the testing space ready for the experiment

val h1 = spark.table(tableName)

import org.apache.spark.sql.catalyst.dsl.plans._

val plan = table(tableName).insertInto("t2", overwrite = true)

scala> println(plan.numberedTreeString)

00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false

01 +- 'UnresolvedRelation `h1`

// ResolveRelations logical rule first to resolve UnresolvedRelations

import spark.sessionState.analyzer.ResolveRelations

val rrPlan = ResolveRelations(plan)

scala> println(rrPlan.numberedTreeString)

00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false

01 +- 'SubqueryAlias h1

02 +- 'UnresolvedCatalogRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

// FindDataSourceTable logical rule next to resolve UnresolvedCatalogRelations

import org.apache.spark.sql.execution.datasources.FindDataSourceTable

val findTablesRule = new FindDataSourceTable(spark)

val planWithTables = findTablesRule(rrPlan)

// At long last...

// Note HiveTableRelation in the logical plan

scala> println(planWithTables.numberedTreeString)

00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false

01 +- SubqueryAlias h1

02 +- HiveTableRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#13L]

HiveTableRelation is partitioned when it has at least one partition.

The metadata of a HiveTableRelation (in a catalog) has to meet the requirements:

The database is defined
The partition schema is of the same type as partitionCols
The data schema is of the same type as dataCols

HiveTableRelation has the output attributes made up of data followed by partition columns.

Note

HiveTableRelation is removed from a logical plan when HiveAnalysis logical rule is executed (and transforms a InsertIntoTable with HiveTableRelation to an InsertIntoHiveTable).

HiveTableRelation is when RelationConversions rule is executed (and converts HiveTableRelations to LogicalRelations).

HiveTableRelation is resolved to HiveTableScanExec physical operator when HiveTableScans strategy is executed.

Computing Statistics — `computeStats` Method



computeStats(): Statistics

1

2

3

4

5

computeStats(): Statistics

Note	`computeStats` is part of LeafNode Contract to compute statistics for cost-based optimizer.

computeStats takes the table statistics from the table metadata if defined and converts them to Spark statistics (with output columns).

If the table statistics are not available, computeStats reports an IllegalStateException.



table stats must be specified.

1

2

3

4

5

table stats must be specified.

Creating HiveTableRelation Instance

HiveTableRelation takes the following when created:

Table metadata
Columns (as a collection of AttributeReferences)
Partitions (as a collection of AttributeReferences)

Hint Logical Operator

Caution

FIXME

GroupingSets Unary Logical Operator

GroupingSets is a unary logical operator that represents SQL’s GROUPING SETS variant of GROUP BY clause.



val q = sql("""
  SELECT customer, year, SUM(sales)
  FROM VALUES ("abc", 2017, 30) AS t1 (customer, year, sales)
  GROUP BY customer, year
  GROUPING SETS ((customer), (year))
  """)
scala> println(q.queryExecution.logical.numberedTreeString)
00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)]
01 +- 'SubqueryAlias t1
02    +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

val q = sql("""

SELECT customer, year, SUM(sales)

FROM VALUES ("abc", 2017, 30) AS t1 (customer, year, sales)

GROUP BY customer, year

GROUPING SETS ((customer), (year))

""")

scala> println(q.queryExecution.logical.numberedTreeString)

00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)]

01 +- 'SubqueryAlias t1

02 +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)]

GroupingSets operator is resolved to an Aggregate logical operator at analysis phase.



scala> println(q.queryExecution.analyzed.numberedTreeString)
00 Aggregate [customer#8, year#9, spark_grouping_id#5], [customer#8, year#9, sum(cast(sales#2 as bigint)) AS sum(sales)#4L]
01 +- Expand [List(customer#0, year#1, sales#2, customer#6, null, 1), List(customer#0, year#1, sales#2, null, year#7, 2)], [customer#0, year#1, sales#2, customer#8, year#9, spark_grouping_id#5]
02    +- Project [customer#0, year#1, sales#2, customer#0 AS customer#6, year#1 AS year#7]
03       +- SubqueryAlias t1
04          +- LocalRelation [customer#0, year#1, sales#2]

1

2

3

4

5

6

7

8

9

10

scala> println(q.queryExecution.analyzed.numberedTreeString)

00 Aggregate [customer#8, year#9, spark_grouping_id#5], [customer#8, year#9, sum(cast(sales#2 as bigint)) AS sum(sales)#4L]

01 +- Expand [List(customer#0, year#1, sales#2, customer#6, null, 1), List(customer#0, year#1, sales#2, null, year#7, 2)], [customer#0, year#1, sales#2, customer#8, year#9, spark_grouping_id#5]

02 +- Project [customer#0, year#1, sales#2, customer#0 AS customer#6, year#1 AS year#7]

03 +- SubqueryAlias t1

04 +- LocalRelation [customer#0, year#1, sales#2]

Note	`GroupingSets` can only be created using SQL.

Note	`GroupingSets` is not supported on Structured Streaming’s streaming Datasets.

GroupingSets is never resolved (as it can only be converted to an Aggregate logical operator).

The output schema of a GroupingSets are exactly the attributes of aggregate named expressions.

Analysis Phase

GroupingSets operator is resolved at analysis phase in the following logical evaluation rules:

ResolveAliases for unresolved aliases in aggregate named expressions
ResolveGroupingAnalytics

GroupingSets operator is resolved to an Aggregate with Expand logical operators.



val spark: SparkSession = ...
// using q from the example above
val plan = q.queryExecution.logical

scala> println(plan.numberedTreeString)
00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)]
01 +- 'SubqueryAlias t1
02    +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)]

// Note unresolvedalias for SUM expression
// Note UnresolvedInlineTable and SubqueryAlias

// FIXME Show the evaluation rules to get rid of the unresolvable parts

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

val spark: SparkSession = ...

// using q from the example above

val plan = q.queryExecution.logical

scala> println(plan.numberedTreeString)

00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)]

01 +- 'SubqueryAlias t1

02 +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)]

// Note unresolvedalias for SUM expression

// Note UnresolvedInlineTable and SubqueryAlias

// FIXME Show the evaluation rules to get rid of the unresolvable parts

Creating GroupingSets Instance

GroupingSets takes the following when created:

Expressions from GROUPING SETS clause
Grouping expressions from GROUP BY clause
Child logical plan
Aggregate named expressions

Generate Unary Logical Operator for Lateral Views

Generate is a unary logical operator that is created to represent the following (after a logical plan is analyzed):

Generator or GeneratorOuter expressions (by ExtractGenerator logical evaluation rule)
SQL’s LATERAL VIEW clause (in SELECT or FROM clauses)

resolved flag is…FIXME

Note	`resolved` is part of LogicalPlan Contract to…FIXME.

producedAttributes…FIXME

The output schema of a Generate is…FIXME

Note	`Generate` logical operator is resolved to GenerateExec unary physical operator in BasicOperators execution planning strategy.

Tip

Use generate operator from Catalyst DSL to create a Generate logical operator, e.g. for testing or Spark SQL internals exploration.



import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.types._
val lr = LocalRelation('key.int, 'values.array(StringType))

// JsonTuple generator
import org.apache.spark.sql.catalyst.expressions.JsonTuple
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.expressions.Expression
val children: Seq[Expression] = Seq("e")
val json_tuple = JsonTuple(children)

import org.apache.spark.sql.catalyst.dsl.plans._  // <-- gives generate
val plan = lr.generate(
  generator = json_tuple,
  join = true,
  outer = true,
  alias = Some("alias"),
  outputNames = Seq.empty)
scala> println(plan.numberedTreeString)
00 'Generate json_tuple(e), true, true, alias
01 +- LocalRelation <empty>, [key#0, values#1]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

import org.apache.spark.sql.catalyst.plans.logical._

import org.apache.spark.sql.types._

val lr = LocalRelation('key.int, 'values.array(StringType))

// JsonTuple generator

import org.apache.spark.sql.catalyst.expressions.JsonTuple

import org.apache.spark.sql.catalyst.dsl.expressions._

import org.apache.spark.sql.catalyst.expressions.Expression

val children: Seq[Expression] = Seq("e")

val json_tuple = JsonTuple(children)

import org.apache.spark.sql.catalyst.dsl.plans._ // <-- gives generate

val plan = lr.generate(

generator = json_tuple,

join = true,

outer = true,

alias = Some("alias"),

outputNames = Seq.empty)

scala> println(plan.numberedTreeString)

00 'Generate json_tuple(e), true, true, alias

01 +- LocalRelation <empty>, [key#0, values#1]

Creating Generate Instance

Generate takes the following when created:

Generator expression
join flag…FIXME
outer flag…FIXME
Optional qualifier
Output attributes
Child logical plan

Generate initializes the internal registries and counters.

Filter Unary Logical Operator

Filter is a unary logical operator that takes the following when created:

Condition expression
Child logical operator

Filter is created when…FIXME

ExternalRDD

ExternalRDD is a leaf logical operator that is a logical representation of (the data from) an RDD in a logical query plan.

ExternalRDD is created when:

SparkSession is requested to create a DataFrame from RDD of product types (e.g. Scala case classes, tuples) or Dataset from RDD of a given type
ExternalRDD is requested to create a new instance



val pairsRDD = sc.parallelize((0, "zero") :: (1, "one") :: (2, "two") :: Nil)

// A tuple of Int and String is a product type
scala> :type pairsRDD
org.apache.spark.rdd.RDD[(Int, String)]

val pairsDF = spark.createDataFrame(pairsRDD)

// ExternalRDD represents the pairsRDD in the query plan
val logicalPlan = pairsDF.queryExecution.logical
scala> println(logicalPlan.numberedTreeString)
00 SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#10, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false) AS _2#11]
01 +- ExternalRDD [obj#9]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

val pairsRDD = sc.parallelize((0, "zero") :: (1, "one") :: (2, "two") :: Nil)

// A tuple of Int and String is a product type

scala> :type pairsRDD

org.apache.spark.rdd.RDD[(Int, String)]

val pairsDF = spark.createDataFrame(pairsRDD)

// ExternalRDD represents the pairsRDD in the query plan

val logicalPlan = pairsDF.queryExecution.logical

scala> println(logicalPlan.numberedTreeString)

00 SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#10, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false) AS _2#11]

01 +- ExternalRDD [obj#9]

ExternalRDD is a MultiInstanceRelation and a ObjectProducer.

Note	`ExternalRDD` is resolved to ExternalRDDScanExec when `BasicOperators` execution planning strategy is executed.

`newInstance` Method



newInstance(): LogicalRDD.this.type

1

2

3

4

5

newInstance(): LogicalRDD.this.type

Note	`newInstance` is part of MultiInstanceRelation Contract to…FIXME.

newInstance…FIXME

Computing Statistics — `computeStats` Method



computeStats(): Statistics

1

2

3

4

5

computeStats(): Statistics

Note	`computeStats` is part of LeafNode Contract to compute statistics for cost-based optimizer.

computeStats…FIXME

Creating ExternalRDD Instance

ExternalRDD takes the following when created:

Output schema attribute
RDD of T
SparkSession

Creating ExternalRDD — `apply` Factory Method



apply[T: Encoder](rdd: RDD[T], session: SparkSession): LogicalPlan

1

2

3

4

5

apply[T: Encoder](rdd: RDD[T], session: SparkSession): LogicalPlan

apply…FIXME

Note	`apply` is used when `SparkSession` is requested to create a DataFrame from RDD of product types (e.g. Scala case classes, tuples) or Dataset from RDD of a given type.

ExplainCommand

2012-11-08admin阅读(1594)

ExplainCommand Logical Command

ExplainCommand is a logical command with side effect that allows users to see how a structured query is structured and will eventually be executed, i.e. shows logical and physical plans with or without details about codegen and cost statistics.

When executed, ExplainCommand computes a QueryExecution that is then used to output a single-column DataFrame with the following:

codegen explain, i.e. WholeStageCodegen subtrees if codegen flag is enabled.
extended explain, i.e. the parsed, analyzed, optimized logical plans with the physical plan if extended flag is enabled.
cost explain, i.e. optimized logical plan with stats if cost flag is enabled.
simple explain, i.e. the physical plan only when no codegen and extended flags are enabled.

ExplainCommand is created by Dataset’s explain operator and EXPLAIN SQL statement (accepting EXTENDED and CODEGEN options).



// Explain in SQL

scala> sql("EXPLAIN EXTENDED show tables").show(truncate = false)
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|plan                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|== Parsed Logical Plan ==
ShowTablesCommand

== Analyzed Logical Plan ==
tableName: string, isTemporary: boolean
ShowTablesCommand

== Optimized Logical Plan ==
ShowTablesCommand

== Physical Plan ==
ExecutedCommand
   +- ShowTablesCommand|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

// Explain in SQL

scala> sql("EXPLAIN EXTENDED show tables").show(truncate = false)

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|plan |

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|== Parsed Logical Plan ==

ShowTablesCommand

== Analyzed Logical Plan ==

tableName: string, isTemporary: boolean

ShowTablesCommand

== Optimized Logical Plan ==

ShowTablesCommand

== Physical Plan ==

ExecutedCommand

+- ShowTablesCommand|

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The following EXPLAIN variants in SQL queries are not supported:

EXPLAIN FORMATTED
EXPLAIN LOGICAL



scala> sql("EXPLAIN LOGICAL show tables")
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: EXPLAIN LOGICAL(line 1, pos 0)

== SQL ==
EXPLAIN LOGICAL show tables
^^^
...

1

2

3

4

5

6

7

8

9

10

11

12

scala> sql("EXPLAIN LOGICAL show tables")

org.apache.spark.sql.catalyst.parser.ParseException:

Operation not allowed: EXPLAIN LOGICAL(line 1, pos 0)

== SQL ==

EXPLAIN LOGICAL show tables

^^^

...

The output schema of a ExplainCommand is…FIXME

Creating ExplainCommand Instance

ExplainCommand takes the following when created:

LogicalPlan
extended flag whether to include extended details in the output when ExplainCommand is executed (disabled by default)
codegen flag whether to include codegen details in the output when ExplainCommand is executed (disabled by default)
cost flag whether to include code in the output when ExplainCommand is executed (disabled by default)

ExplainCommand initializes output attribute.

Note	`ExplainCommand` is created when…FIXME

Executing Logical Command (Computing Text Representation of QueryExecution) — `run` Method



run(sparkSession: SparkSession): Seq[Row]

1

2

3

4

5

run(sparkSession: SparkSession): Seq[Row]

Note	`run` is part of RunnableCommand Contract to execute (run) a logical command.

run computes QueryExecution and returns its text representation in a single Row.

Internally, run creates a IncrementalExecution for a streaming dataset directly or requests SessionState to execute the LogicalPlan.

Note	Streaming Dataset is part of Spark Structured Streaming.

run then requests QueryExecution to build the output text representation, i.e. codegened, extended (with logical and physical plans), with stats, or simple.

In the end, run creates a Row with the text representation.

Expand

2012-11-07admin阅读(2953)

Expand Unary Logical Operator

Expand is a unary logical operator that represents Cube, Rollup, GroupingSets and TimeWindow logical operators after they have been resolved at analysis phase.



FIXME Examples for
1. Cube
2. Rollup
3. GroupingSets
4. See TimeWindow

val q = ...

scala> println(q.queryExecution.logical.numberedTreeString)
...

1

2

3

4

5

6

7

8

9

10

11

12

13

14

FIXME Examples for

1. Cube

2. Rollup

3. GroupingSets

4. See TimeWindow

val q = ...

scala> println(q.queryExecution.logical.numberedTreeString)

...

Note	`Expand` logical operator is resolved to ExpandExec physical operator in BasicOperators execution planning strategy.

Table 1. Expand’s Properties
Name	Description
`references`	`AttributeSet` from projections
`validConstraints`	Empty set of expressions

Analysis Phase

Expand logical operator is resolved to at analysis phase in the following logical evaluation rules:

ResolveGroupingAnalytics (for Cube, Rollup, GroupingSets logical operators)
TimeWindowing (for TimeWindow logical operator)

Note	Aggregate → (Cube\|Rollup\|GroupingSets) → constructAggregate → constructExpand



val spark: SparkSession = ...
// using q from the example above
val plan = q.queryExecution.logical

scala> println(plan.numberedTreeString)
...FIXME

1

2

3

4

5

6

7

8

9

10

val spark: SparkSession = ...

// using q from the example above

val plan = q.queryExecution.logical

scala> println(plan.numberedTreeString)

...FIXME

Rule-Based Logical Query Optimization Phase

Creating Expand Instance

Expand takes the following when created:

Projection expressions
Output schema attributes
Child logical plan

Except

2012-11-06admin阅读(1329)

Except

Except is…FIXME

DropTableCommand

2012-11-05admin阅读(1604)

DropTableCommand Logical Command

DropTableCommand is a logical command for FIXME.

Executing Logical Command — `run` Method



run(sparkSession: SparkSession): Seq[Row]

1

2

3

4

5

run(sparkSession: SparkSession): Seq[Row]

Note	`run` is part of RunnableCommand Contract to execute (run) a logical command.

run…FIXME

spark-sql 第27页

HiveTableRelation

HiveTableRelation Leaf Logical Operator — Representing Hive Tables in Logical Plan

Computing Statistics — `computeStats` Method

Creating HiveTableRelation Instance

Hint

Hint Logical Operator

GroupingSets

GroupingSets Unary Logical Operator

Analysis Phase

Creating GroupingSets Instance

Generate

Generate Unary Logical Operator for Lateral Views

Creating Generate Instance

Filter

Filter Unary Logical Operator

ExternalRDD

ExternalRDD

`newInstance` Method

Computing Statistics — `computeStats` Method

Creating ExternalRDD Instance

Creating ExternalRDD — `apply` Factory Method

ExplainCommand

ExplainCommand Logical Command

Creating ExplainCommand Instance

Executing Logical Command (Computing Text Representation of QueryExecution) — `run` Method

Expand

Expand Unary Logical Operator

Analysis Phase

Rule-Based Logical Query Optimization Phase

Creating Expand Instance

Except

Except

DropTableCommand

DropTableCommand Logical Command

Executing Logical Command — `run` Method

欢迎关注：spark技术分享

关注公众号：spark技术分享

QQ咨询

回顶部

spark-sql 第27页

HiveTableRelation Leaf Logical Operator — Representing Hive Tables in Logical Plan

Computing Statistics — computeStats Method

Creating HiveTableRelation Instance

Hint Logical Operator

GroupingSets Unary Logical Operator

Analysis Phase

Creating GroupingSets Instance

Generate Unary Logical Operator for Lateral Views

Creating Generate Instance

Filter Unary Logical Operator

ExternalRDD

newInstance Method

Computing Statistics — computeStats Method

Creating ExternalRDD Instance

Creating ExternalRDD — apply Factory Method

ExplainCommand Logical Command

Creating ExplainCommand Instance

Executing Logical Command (Computing Text Representation of QueryExecution) — run Method

Expand Unary Logical Operator

Analysis Phase

Rule-Based Logical Query Optimization Phase

Creating Expand Instance

Except

DropTableCommand Logical Command

Executing Logical Command — run Method

欢迎关注：spark技术分享

关注公众号：spark技术分享

QQ咨询

回顶部

Computing Statistics — `computeStats` Method

`newInstance` Method

Computing Statistics — `computeStats` Method

Creating ExternalRDD — `apply` Factory Method

Executing Logical Command (Computing Text Representation of QueryExecution) — `run` Method

Executing Logical Command — `run` Method