QueryPlan — Structured Query Plan-spark技术分享

QueryPlan — Structured Query Plan

QueryPlan is part of Catalyst to build a tree of relational operators of a structured query.

Scala-specific, QueryPlan is an abstract class that is the base class of LogicalPlan and SparkPlan (for logical and physical plans, respectively).

A QueryPlan has an output attributes (that serves as the base for the schema), a collection of expressions and a schema.

QueryPlan has statePrefix that is used when displaying a plan with ! to indicate an invalid plan, and ' to indicate an unresolved plan.

A QueryPlan is invalid if there are missing input attributes and children subnodes are non-empty.

A QueryPlan is unresolved if the column names have not been verified and column types have not been looked up in the Catalog.

A QueryPlan has zero, one or more Catalyst expressions.

Note	`QueryPlan` is a tree of operators that have a tree of expressions.

QueryPlan has references property that is the attributes that appear in expressions from this operator.

QueryPlan Contract



abstract class QueryPlan[T] extends TreeNode[T] {
  def output: Seq[Attribute]
  def validConstraints: Set[Expression]
  // FIXME
}

abstract class QueryPlan[T] extends TreeNode[T] {

def output: Seq[Attribute]

def validConstraints: Set[Expression]

// FIXME

}

Table 1. QueryPlan Contract
Method	Description
`validConstraints`
output	Attribute expressions

Transforming Expressions — `transformExpressions` Method



transformExpressions(rule: PartialFunction[Expression, Expression]): this.type

transformExpressions(rule: PartialFunction[Expression, Expression]): this.type

transformExpressions simply executes transformExpressionsDown with the input rule.

Note	`transformExpressions` is used when…FIXME

Transforming Expressions — `transformExpressionsDown` Method



transformExpressionsDown(rule: PartialFunction[Expression, Expression]): this.type

transformExpressionsDown(rule: PartialFunction[Expression, Expression]): this.type

transformExpressionsDown applies the rule to each expression in the query operator.

Note	`transformExpressionsDown` is used when…FIXME

Applying Transformation Function to Each Expression in Query Operator — `mapExpressions` Method



mapExpressions(f: Expression => Expression): this.type

mapExpressions(f: Expression => Expression): this.type

mapExpressions…FIXME

Note	`mapExpressions` is used when…FIXME

Output Schema Attribute Set — `outputSet` Property



outputSet: AttributeSet

outputSet: AttributeSet

outputSet simply returns an AttributeSet for the output schema attributes.

Note	`outputSet` is used when…FIXME

`producedAttributes` Property

Caution

FIXME

Missing Input Attributes — `missingInput` Property



def missingInput: AttributeSet

def missingInput: AttributeSet

missingInput are attributes that are referenced in expressions but not provided by this node’s children (as inputSet) and are not produced by this node (as producedAttributes).

Output Schema — `schema` Property

You can request the schema of a QueryPlan using schema that builds StructType from the output attributes.



// the query
val dataset = spark.range(3)

scala> dataset.queryExecution.analyzed.schema
res6: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))

// the query

val dataset = spark.range(3)

scala> dataset.queryExecution.analyzed.schema

res6: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))

Output Schema Attributes — `output` Property



output: Seq[Attribute]

output: Seq[Attribute]

output is a collection of Catalyst attribute expressions that represent the result of a projection in a query that is later used to build the output schema.

Note	`output` property is also called output schema or result schema.



val q = spark.range(3)

scala> q.queryExecution.analyzed.output
res0: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.withCachedData.output
res1: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.optimizedPlan.output
res2: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.sparkPlan.output
res3: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.executedPlan.output
res4: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

val q = spark.range(3)

scala> q.queryExecution.analyzed.output

res0: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.withCachedData.output

res1: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.optimizedPlan.output

res2: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.sparkPlan.output

res3: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.executedPlan.output

res4: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

Tip

You can build a StructType from output collection of attributes using toStructType method (that is available through the implicit class AttributeSeq).



scala> q.queryExecution.analyzed.output.toStructType
res5: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))

scala> q.queryExecution.analyzed.output.toStructType

res5: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))

Simple (Basic) Description with State Prefix — `simpleString` Method



simpleString: String

simpleString: String

Note	`simpleString` is part of TreeNode Contract for the simple text description of a tree node.

simpleString adds a state prefix to the node’s simple text description.

State Prefix — `statePrefix` Method



statePrefix: String

statePrefix: String

Internally, statePrefix gives ! (exclamation mark) when the node is invalid, i.e. missingInput is not empty, and the node is a parent node. Otherwise, statePrefix gives an empty string.

Note	`statePrefix` is used exclusively when `QueryPlan` is requested for the simple text node description.

Transforming All Expressions — `transformAllExpressions` Method



transformAllExpressions(rule: PartialFunction[Expression, Expression]): this.type

transformAllExpressions(rule: PartialFunction[Expression, Expression]): this.type

transformAllExpressions…FIXME

Note	`transformAllExpressions` is used when…FIXME

Simple (Basic) Description with State Prefix — `verboseString` Method



verboseString: String

verboseString: String

Note	`verboseString` is part of TreeNode Contract to…FIXME.

verboseString simply returns the simple (basic) description with state prefix.

`innerChildren` Method



innerChildren: Seq[QueryPlan[_]]

innerChildren: Seq[QueryPlan[_]]

Note	`innerChildren` is part of TreeNode Contract to…FIXME.

innerChildren simply returns the subqueries.

`subqueries` Method



subqueries: Seq[PlanType]

subqueries: Seq[PlanType]

subqueries…FIXME

Note	`subqueries` is used when…FIXME

Canonicalizing Query Plan — `doCanonicalize` Method



doCanonicalize(): PlanType

doCanonicalize(): PlanType

doCanonicalize…FIXME

Note	`doCanonicalize` is used when…FIXME

QueryPlan — Structured Query Plan