关注 spark技术分享,
撸spark源码 玩spark最佳实践

LogicalPlan Contract — Logical Operator with Children and Expressions / Logical Query Plan

LogicalPlan Contract — Logical Relational Operator with Children and Expressions / Logical Query Plan

LogicalPlan is an extension of the QueryPlan contract for logical operators to build a logical query plan (i.e. a tree of logical operators).

Note
A logical query plan is a tree of nodes of logical operators that in turn can have (trees of) Catalyst expressions. In other words, there are at least two trees at every level (operator).

LogicalPlan can be resolved.

In order to get the logical plan of a structured query you should use the QueryExecution.

LogicalPlan goes through execution stages (as a QueryExecution). In order to convert a LogicalPlan to a QueryExecution you should use SessionState and request it to “execute” the plan.

Note

A common idiom in Spark SQL to make sure that a logical plan can be analyzed is to request a SparkSession for the SessionState that is in turn requested to execute the logical plan (which simply creates a QueryExecution).

Note

Another common idiom in Spark SQL to convert a LogicalPlan into a Dataset is to use Dataset.ofRows internal method that executes the logical plan followed by creating a Dataset with the QueryExecution and a RowEncoder.

A logical operator is considered partially resolved when its child operators are resolved (aka children resolved).

A logical operator is (fully) resolved to a specific schema when all expressions and the children are resolved.

A logical plan knows the size of objects that are results of query operators, like join, through Statistics object.

A logical plan knows the maximum number of records it can compute.

LogicalPlan can be streaming if it contains one or more structured streaming sources.

Note
LogicalPlan is in the end transformed to a physical query plan.
Table 1. Logical Operators / Specialized Logical Plans
LogicalPlan Description

LeafNode

Logical operator with no child operators

UnaryNode

Logical plan with a single child logical operator

BinaryNode

Logical operator with two child logical operators

Command

RunnableCommand

Table 2. LogicalPlan’s Internal Registries and Counters
Name Description

statsCache

Cached plan statistics (as Statistics) of the LogicalPlan

Computed and cached in stats.

Used in stats and verboseStringWithSuffix.

Reset in invalidateStatsCache

Getting Cached or Calculating Estimated Statistics — stats Method

stats returns the cached plan statistics or computes a new one (and caches it as statsCache).

Note

stats is used when:

invalidateStatsCache method

Caution
FIXME

verboseStringWithSuffix method

Caution
FIXME

setAnalyzed method

Caution
FIXME

Is Logical Plan Streaming? — isStreaming method

isStreaming is part of the public API of LogicalPlan and is enabled (i.e. true) when a logical plan is a streaming source.

By default, it walks over subtrees and calls itself, i.e. isStreaming, on every child node to find a streaming source.

Note
Streaming Datasets are part of Structured Streaming.

Refreshing Child Logical Plans — refresh Method

refresh calls itself recursively for every child logical operator.

Note
refresh is overriden by LogicalRelation only (that refreshes the location of HadoopFsRelation relations only).
Note

refresh is used when:

resolveQuoted Method

resolveQuoted…​FIXME

Note
resolveQuoted is used when…​FIXME

Resolving Attribute By Name Parts — resolve Method

  1. A protected method

resolve…​FIXME

Note
resolve is used when…​FIXME
赞(0) 打赏
未经允许不得转载:spark技术分享 » LogicalPlan Contract — Logical Operator with Children and Expressions / Logical Query Plan
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏