关注 spark技术分享,
撸spark源码 玩spark最佳实践

BoundReference

admin阅读(1276)

BoundReference Leaf Expression — Reference to Value in Internal Binary Row

BoundReference is a leaf expression that evaluates to a value in an internal binary row at a specified position and of a given data type.

BoundReference takes the following when created:

  • Ordinal, i.e. the position

  • Data type of the value

  • nullable flag that controls whether the value can be null or not

You can also create a BoundReference using Catalyst DSL’s at method.

Evaluating Expression — eval Method

Note
eval is part of Expression Contract for the interpreted (non-code-generated) expression evaluation, i.e. evaluating a Catalyst expression to a JVM object for a given internal binary row.

eval gives the value at position from the input internal binary row that is of a correct type.

Internally, eval returns null if the value at the position is null.

Otherwise, eval uses the methods of InternalRow per the defined data type to access the value.

Table 1. eval’s DataType to InternalRow’s Methods Mapping (in execution order)
DataType InternalRow’s Method

BooleanType

getBoolean

ByteType

getByte

ShortType

getShort

IntegerType or DateType

getInt

LongType or TimestampType

getLong

FloatType

getFloat

DoubleType

getDouble

StringType

getUTF8String

BinaryType

getBinary

CalendarIntervalType

getInterval

DecimalType

getDecimal

StructType

getStruct

ArrayType

getArray

MapType

getMap

others

get(ordinal, dataType)

Generating Java Source Code (ExprCode) For Code-Generated Expression Evaluation — doGenCode Method

Note
doGenCode is part of Expression Contract to generate a Java source code (ExprCode) for code-generated expression evaluation.

doGenCode…​FIXME

BindReferences.bindReference Method

bindReference…​FIXME

Note
bindReference is used when…​FIXME

Attribute

admin阅读(1518)

Attribute — Base of Leaf Named Expressions

Attribute is the base of leaf named expressions.

Note
QueryPlan uses Attributes to build the schema of the query (it represents).

Table 1. Attribute Contract
Property Description

withMetadata

withName

withNullability

withQualifier

newInstance

When requested for references, Attribute gives the reference to itself only.

As a NamedExpression, Attribute gives the reference to itself only when requested for toAttribute.

Table 2. Attributes (Direct Implementations)
Attribute Description

AttributeReference

PrettyAttribute

UnresolvedAttribute

As an optimization, Attribute is marked as to not tolerate nulls, and when given a null input produces a null output.

AttributeReference

admin阅读(3738)

AttributeReference

AttributeReference is…​FIXME

AggregateWindowFunction Contract — Declarative Window Aggregate Function Expressions

admin阅读(3192)

AggregateWindowFunction Contract — Declarative Window Aggregate Function Expressions

AggregateWindowFunction is the extension of the DeclarativeAggregate Contract for declarative aggregate function expressions that are also WindowFunction expressions.

AggregateWindowFunction uses IntegerType as the data type of the result of evaluating itself.

AggregateWindowFunction is nullable by default.

As a WindowFunction expression, AggregateWindowFunction uses a SpecifiedWindowFrame (with the RowFrame frame type, the UnboundedPreceding lower and the CurrentRow upper frame boundaries) as the frame.

AggregateWindowFunction is a DeclarativeAggregate expression that does not support merging (two aggregation buffers together) and throws an UnsupportedOperationException whenever requested for it.

Table 1. AggregateWindowFunctions (Direct Implementations)
AggregateWindowFunction Description

RankLike

RowNumberLike

SizeBasedWindowFunction

Window functions that require the size of the current window for calculation

AggregateFunction Contract — Aggregate Function Expressions

admin阅读(1622)

AggregateFunction Contract — Aggregate Function Expressions

AggregateFunction is the contract for Catalyst expressions that represent aggregate functions.

AggregateFunction is used wrapped inside a AggregateExpression (using toAggregateExpression method) when:

Note
Aggregate functions are not foldable, i.e. FIXME
Table 1. AggregateFunction Top-Level Catalyst Expressions
Name Behaviour Examples

DeclarativeAggregate

ImperativeAggregate

TypedAggregateExpression

AggregateFunction Contract

Table 2. AggregateFunction Contract
Method Description

aggBufferSchema

Schema of an aggregation buffer to hold partial aggregate results.

Used mostly in ScalaUDAF and AggregationIterator

aggBufferAttributes

AttributeReferences of an aggregation buffer to hold partial aggregate results.

Used in:

inputAggBufferAttributes

defaultResult

Defaults to None.

Creating AggregateExpression for AggregateFunction — toAggregateExpression Method

  1. Calls the other toAggregateExpression with isDistinct disabled (i.e. false)

toAggregateExpression creates a AggregateExpression for the current AggregateFunction with Complete aggregate mode.

Note

toAggregateExpression is used in:

AggregateExpression

admin阅读(1572)

AggregateExpression — Unevaluable Expression Container for AggregateFunction

AggregateExpression is an unevaluable expression (i.e. with no support for eval and doGenCode methods) that acts as a container for an AggregateFunction.

AggregateExpression contains the following:

  • AggregateFunction

  • AggregateMode

  • isDistinct flag indicating whether this aggregation is distinct or not (e.g. whether SQL’s DISTINCT keyword was used for the aggregate function)

  • ExprId

AggregateExpression is created when:

Table 1. toString’s Prefixes per AggregateMode
Prefix AggregateMode

partial_

Partial

merge_

PartialMerge

(empty)

Final or Complete

Table 2. AggregateExpression’s Properties
Name Description

canonicalized

AggregateExpression with AggregateFunction expression canonicalized with the special ExprId as 0.

children

AggregateFunction expression (for which AggregateExpression was created).

dataType

DataType of AggregateFunction expression

foldable

Disabled (i.e. false)

nullable

Whether or not AggregateFunction expression is nullable.

references

AttributeSet with the following:

resultAttribute

Attribute that is:

  • AttributeReference when AggregateFunction is itself resolved

  • UnresolvedAttribute otherwise

sql

Requests AggregateFunction to generate SQL output (with isDistinct flag).

toString

Prefix per AggregateMode followed by AggregateFunction‘s toAggString (with isDistinct flag).

UnspecifiedDistribution

admin阅读(1487)

UnspecifiedDistribution

UnspecifiedDistribution is a Distribution that…​FIXME

UnspecifiedDistribution specifies None for the required number of partitions.

Note
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property with the default of 200 partitions).

createPartitioning Method

Note
createPartitioning is part of Distribution Contract to create a Partitioning for a given number of partitions.

createPartitioning…​FIXME

OrderedDistribution

admin阅读(1376)

OrderedDistribution

OrderedDistribution is a Distribution that…​FIXME

OrderedDistribution specifies None for the required number of partitions.

Note
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property with the default of 200 partitions).

OrderedDistribution is created when…​FIXME

OrderedDistribution takes SortOrder expressions for ordering when created.

OrderedDistribution requires that the ordering expressions should not be empty (i.e. Nil).

createPartitioning Method

Note
createPartitioning is part of Distribution Contract to create a Partitioning for a given number of partitions.

createPartitioning…​FIXME

HashClusteredDistribution

admin阅读(1539)

HashClusteredDistribution

HashClusteredDistribution is a Distribution that creates a HashPartitioning for the hash expressions and a requested number of partitions.

HashClusteredDistribution specifies None for the required number of partitions.

Note
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property with the default of 200 partitions).

HashClusteredDistribution is created when the following physical operators are requested for a required child distribution:

HashClusteredDistribution takes hash expressions when created.

HashClusteredDistribution requires that the hash expressions should not be empty (i.e. Nil).

HashClusteredDistribution is used when:

  • EnsureRequirements is requested to add an ExchangeCoordinator for Adaptive Query Execution

  • HashPartitioning is requested to satisfies

createPartitioning Method

Note
createPartitioning is part of Distribution Contract to create a Partitioning for a given number of partitions.

createPartitioning creates a HashPartitioning for the hash expressions and the input numPartitions.

关注公众号:spark技术分享

联系我们联系我们