关注 spark技术分享,
撸spark源码 玩spark最佳实践

TreeNode — Node in Catalyst Tree

TreeNode — Node in Catalyst Tree

TreeNode is the contract of nodes in Catalyst tree with name and zero or more children.

TreeNode is a recursive data structure that can have one or many children that are again TreeNodes.

Tip
Read up on <: type operator in Scala in Upper Type Bounds.

Scala-specific, TreeNode is an abstract class that is the base class of Catalyst Expression and QueryPlan abstract classes.

TreeNode therefore allows for building entire trees of TreeNodes, e.g. generic query plans with concrete logical and physical operators that both use Catalyst expressions (which are TreeNodes again).

Note
Spark SQL uses TreeNode for query plans and Catalyst expressions that can further be used together to build more advanced trees, e.g. Catalyst expressions can have query plans as subquery expressions.

TreeNode can itself be a node in a tree or a collection of nodes, i.e. itself and the children nodes. Not only does TreeNode come with the methods that you may have used in Scala Collection API (e.g. map, flatMap, collect, collectFirst, foreach), but also specialized ones for more advanced tree manipulation, e.g. mapChildren, transform, transformDown, transformUp, foreachUp, numberedTreeString, p, asCode, prettyJson.

Table 1. TreeNode API (Public Methods)
Method Description

apply

argString

asCode

collect

collectFirst

collectLeaves

fastEquals

find

flatMap

foreach

foreachUp

generateTreeString

map

mapChildren

nodeName

numberedTreeString

p

prettyJson

simpleString

toJSON

transform

transformDown

transformUp

treeString

verboseString

verboseStringWithSuffix

withNewChildren

Table 2. (Subset of) TreeNode Contract
Method Description

children

Child nodes

verboseString

One-line verbose description

Used when TreeNode is requested for generateTreeString (with verbose flag enabled) and verboseStringWithSuffix

Table 3. TreeNodes
TreeNode Description

Expression

QueryPlan

Tip

TreeNode abstract type is a fairly advanced Scala type definition (at least comparing to the other Scala types in Spark) so understanding its behaviour even outside Spark might be worthwhile by itself.

withNewChildren Method

withNewChildren…​FIXME

Note
withNewChildren is used when…​FIXME

Simple Node Description — simpleString Method

simpleString gives a simple one-line description of a TreeNode.

Internally, simpleString is the nodeName followed by argString separated by a single white space.

Note
simpleString is used when TreeNode is requested for argString (of child nodes) and tree text representation (with verbose flag off).

Numbered Text Representation — numberedTreeString Method

numberedTreeString adds numbers to the text representation of all the nodes.

Note
numberedTreeString is used primarily for interactive debugging using apply and p methods.

Getting n-th TreeNode in Tree (for Interactive Debugging) — apply Method

apply gives number-th tree node in a tree.

Note
apply can be used for interactive debugging.

Internally, apply gets the node at number position or null.

Getting n-th BaseType in Tree (for Interactive Debugging) — p Method

p gives number-th tree node in a tree as BaseType for interactive debugging.

Note
p can be used for interactive debugging.
Note

BaseType is the base type of a tree and in Spark SQL can be:

Text Representation — toString Method

Note
toString is part of Java’s Object Contract for the string representation of an object, e.g. TreeNode.

toString simply returns the text representation of all nodes in the tree.

Text Representation of All Nodes in Tree — treeString Method

  1. Turns verbose flag on

treeString gives the string representation of all the nodes in the TreeNode.

Note

treeString is used when:

Verbose Description with Suffix — verboseStringWithSuffix Method

verboseStringWithSuffix simply returns verbose description.

Note
verboseStringWithSuffix is used exclusively when TreeNode is requested to generateTreeString (with verbose and addSuffix flags enabled).

Generating Text Representation of Inner and Regular Child Nodes — generateTreeString Method

Internally, generateTreeString appends the following node descriptions per the verbose and addSuffix flags:

In the end, generateTreeString calls itself recursively for the innerChildren and the child nodes.

Note
generateTreeString is used exclusively when TreeNode is requested for text representation of all nodes in the tree.

Inner Child Nodes — innerChildren Method

innerChildren returns the inner nodes that should be shown as an inner nested tree of this node.

innerChildren simply returns an empty collection of TreeNodes.

Note
innerChildren is used when TreeNode is requested to generate the text representation of inner and regular child nodes, allChildren and getNodeNumbered.

allChildren Property

Note
allChildren is a Scala lazy value which is computed once when accessed and cached afterwards.

allChildren…​FIXME

Note
allChildren is used when…​FIXME

getNodeNumbered Internal Method

getNodeNumbered…​FIXME

Note
getNodeNumbered is used when…​FIXME

foreach Method

foreach applies the input function f to itself (this) first and then (recursively) to the children.

collect Method

collect…​FIXME

collectFirst Method

collectFirst…​FIXME

collectLeaves Method

collectLeaves…​FIXME

find Method

find…​FIXME

flatMap Method

flatMap…​FIXME

foreachUp Method

foreachUp…​FIXME

map Method

map…​FIXME

mapChildren Method

mapChildren…​FIXME

transform Method

transform…​FIXME

Transforming Nodes Downwards — transformDown Method

transformDown…​FIXME

transformUp Method

transformUp…​FIXME

asCode Method

asCode…​FIXME

prettyJson Method

prettyJson…​FIXME

Note
prettyJson is used when…​FIXME

toJSON Method

toJSON…​FIXME

Note
toJSON is used when…​FIXME

argString Method

argString…​FIXME

Note
argString is used when…​FIXME

nodeName Method

nodeName returns the name of the class with Exec suffix removed (that is used as a naming convention for the class name of physical operators).

Note
nodeName is used when TreeNode is requested for simpleString and asCode.

fastEquals Method

fastEquals…​FIXME

Note
fastEquals is used when…​FIXME
赞(0) 打赏
未经允许不得转载:spark技术分享 » TreeNode — Node in Catalyst Tree
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏