LogicalPlan Contract — Logical Relational Operator with Children and Expressions / Logical Query Plan
LogicalPlan
is an extension of the QueryPlan contract for logical operators to build a logical query plan (i.e. a tree of logical operators).
Note
|
A logical query plan is a tree of nodes of logical operators that in turn can have (trees of) Catalyst expressions. In other words, there are at least two trees at every level (operator). |
LogicalPlan
can be resolved.
In order to get the logical plan of a structured query you should use the QueryExecution.
1 2 3 4 5 6 7 8 9 10 |
scala> :type q org.apache.spark.sql.Dataset[Long] val plan = q.queryExecution.logical scala> :type plan org.apache.spark.sql.catalyst.plans.logical.LogicalPlan |
LogicalPlan
goes through execution stages (as a QueryExecution). In order to convert a LogicalPlan
to a QueryExecution
you should use SessionState
and request it to “execute” the plan.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
scala> :type spark org.apache.spark.sql.SparkSession // You could use Catalyst DSL to create a logical query plan scala> :type plan org.apache.spark.sql.catalyst.plans.logical.LogicalPlan val qe = spark.sessionState.executePlan(plan) scala> :type qe org.apache.spark.sql.execution.QueryExecution |
Note
|
A common idiom in Spark SQL to make sure that a logical plan can be analyzed is to request a
|
Note
|
Another common idiom in Spark SQL to convert a |
A logical operator is considered partially resolved when its child operators are resolved (aka children resolved).
A logical operator is (fully) resolved to a specific schema when all expressions and the children are resolved.
1 2 3 4 5 6 |
scala> plan.resolved res2: Boolean = true |
A logical plan knows the size of objects that are results of query operators, like join
, through Statistics
object.
1 2 3 4 5 6 |
scala> val stats = plan.statistics stats: org.apache.spark.sql.catalyst.plans.logical.Statistics = Statistics(8,false) |
A logical plan knows the maximum number of records it can compute.
1 2 3 4 5 6 |
scala> val maxRows = plan.maxRows maxRows: Option[Long] = None |
LogicalPlan
can be streaming if it contains one or more structured streaming sources.
Note
|
LogicalPlan is in the end transformed to a physical query plan.
|
LogicalPlan | Description |
---|---|
Logical operator with no child operators |
|
|
Logical plan with a single child logical operator |
|
Logical operator with two child logical operators |
Name | Description |
---|---|
Cached plan statistics (as Computed and cached in stats. Used in stats and verboseStringWithSuffix. Reset in invalidateStatsCache |
Getting Cached or Calculating Estimated Statistics — stats
Method
1 2 3 4 5 |
stats(conf: CatalystConf): Statistics |
stats
returns the cached plan statistics or computes a new one (and caches it as statsCache).
Note
|
|
Is Logical Plan Streaming? — isStreaming
method
1 2 3 4 5 |
isStreaming: Boolean |
isStreaming
is part of the public API of LogicalPlan
and is enabled (i.e. true
) when a logical plan is a streaming source.
By default, it walks over subtrees and calls itself, i.e. isStreaming
, on every child node to find a streaming source.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
val spark: SparkSession = ... // Regular dataset scala> val ints = spark.createDataset(0 to 9) ints: org.apache.spark.sql.Dataset[Int] = [value: int] scala> ints.queryExecution.logical.isStreaming res1: Boolean = false // Streaming dataset scala> val logs = spark.readStream.format("text").load("logs/*.out") logs: org.apache.spark.sql.DataFrame = [value: string] scala> logs.queryExecution.logical.isStreaming res2: Boolean = true |
Note
|
Streaming Datasets are part of Structured Streaming. |
Refreshing Child Logical Plans — refresh
Method
1 2 3 4 5 |
refresh(): Unit |
refresh
calls itself recursively for every child logical operator.
Note
|
refresh is overriden by LogicalRelation only (that refreshes the location of HadoopFsRelation relations only).
|
Note
|
|
resolveQuoted
Method
1 2 3 4 5 6 7 |
resolveQuoted( name: String, resolver: Resolver): Option[NamedExpression] |
resolveQuoted
…FIXME
Note
|
resolveQuoted is used when…FIXME
|
Resolving Attribute By Name Parts — resolve
Method
1 2 3 4 5 6 7 8 9 10 11 12 |
resolve(schema: StructType, resolver: Resolver): Seq[Attribute] resolve( nameParts: Seq[String], resolver: Resolver): Option[NamedExpression] resolve( nameParts: Seq[String], input: Seq[Attribute], resolver: Resolver): Option[NamedExpression] (1) |
-
A protected method
resolve
…FIXME
Note
|
resolve is used when…FIXME
|