关注 spark技术分享,
撸spark源码 玩spark最佳实践

CumeDist

admin阅读(1296)

CumeDist Declarative Window Aggregate Function Expression

CumeDist is a SizeBasedWindowFunction and a RowNumberLike expression that is used for the following:

CumeDist takes no input parameters when created.

CumeDist uses cume_dist for the user-facing name.

As an WindowFunction expression (indirectly), CumeDist requires the SpecifiedWindowFrame (with the RangeFrame frame type, the UnboundedPreceding lower and the CurrentRow upper frame boundaries) as the frame.

Note
The frame for CumeDist expression is range-based instead of row-based, because it has to return the same value for tie values in a window (equal values per ORDER BY specification).

As a DeclarativeAggregate expression (indirectly), CumeDist defines the evaluateExpression expression which returns the final value when CumeDist is evaluated. The value uses the formula rowNumber / n where rowNumber is the row number in a window frame (the number of values before and including the current row) divided by the number of rows in the window frame.

CreateNamedStructUnsafe

admin阅读(1216)

CreateNamedStructUnsafe Expression

CreateNamedStructUnsafe is a CreateNamedStructLike expression that…​FIXME

Generating Java Source Code (ExprCode) For Code-Generated Expression Evaluation — doGenCode Method

Note
doGenCode is part of Expression Contract to generate a Java source code (ExprCode) for code-generated expression evaluation.

doGenCode…​FIXME

CreateNamedStructLike Contract

admin阅读(1339)

CreateNamedStructLike Contract

CreateNamedStructLike is the base of Catalyst expressions that FIXME.

CreateNamedStructLike is not nullable.

CreateNamedStructLike is foldable only if all value expressions are.

Table 1. CreateNamedStructLikes (Direct Implementations)
CreateNamedStructLike Description

CreateNamedStruct

CreateNamedStructUnsafe

Table 2. CreateNamedStructLike’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

dataType

nameExprs

Catalyst expressions for names

names

valExprs

Catalyst expressions for values

Checking Input Data Types — checkInputDataTypes Method

Note
checkInputDataTypes is part of the Expression Contract to verify (check the correctness of) the input data types.

checkInputDataTypes…​FIXME

Evaluating Expression — eval Method

Note
eval is part of Expression Contract for the interpreted (non-code-generated) expression evaluation, i.e. evaluating a Catalyst expression to a JVM object for a given internal binary row.

eval…​FIXME

CreateNamedStruct

admin阅读(1415)

CreateNamedStruct Expression

CreateNamedStruct is a CreateNamedStructLike expression that…​FIXME

CreateNamedStruct uses named_struct for the user-facing name.

CreateNamedStruct is registered in FunctionRegistry under the name of named_struct SQL function.

CreateNamedStruct is created when:

CreateNamedStruct takes a collection of Catalyst expressions when created.

Tip

Use namedStruct operator from Catalyst DSL’s expressions to create a CreateNamedStruct expression.

Generating Java Source Code (ExprCode) For Code-Generated Expression Evaluation — doGenCode Method

Note
doGenCode is part of Expression Contract to generate a Java source code (ExprCode) for code-generated expression evaluation.

doGenCode…​FIXME

CreateArray

admin阅读(1086)

CreateArray

CreateArray is…​FIXME

ComplexTypedAggregateExpression

admin阅读(1270)

ComplexTypedAggregateExpression

ComplexTypedAggregateExpression is…​FIXME

ComplexTypedAggregateExpression is created when…​FIXME

Creating ComplexTypedAggregateExpression Instance

ComplexTypedAggregateExpression takes the following when created:

CollectionGenerator

admin阅读(1296)

CollectionGenerator Generator Expression Contract

CollectionGenerator is the contract in Spark SQL for Generator expressions that generate a collection object (i.e. an array or map) and (at execution time) use a different path for whole-stage Java code generation (while executing GenerateExec physical operator with Whole-Stage Java Code Generation enabled).

Table 1. CollectionGenerator Contract
Method Description

collectionType

The type of the returned collection object.

Used when…​

inline

Flag whether to inline rows during whole-stage Java code generation.

Used when…​

position

Flag whether to include the positions of elements within the result collection.

Used when…​

Table 2. CollectionGenerators
CollectionGenerator Description

Inline

ExplodeBase

Explode

PosExplode

CodegenFallback

admin阅读(1667)

CodegenFallback Contract — Catalyst Expressions with Fallback Code Generation Mode

CodegenFallback is the contract of Catalyst expressions that do not support a Java code generation and want to fall back to interpreted mode (aka fallback mode).

CodegenFallback is used when CollapseCodegenStages physical optimization is requested to execute (and enforce whole-stage codegen requirements for Catalyst expressions).

Table 1. (Some Examples of) CodegenFallbacks
CodegenFallback Description

CurrentDate

CurrentTimestamp

Cube

JsonToStructs

Rollup

StructsToJson

Generating Java Source Code (ExprCode) For Code-Generated Expression Evaluation — doGenCode Method

Note
doGenCode is part of Expression Contract to generate a Java source code (ExprCode) for code-generated expression evaluation.

doGenCode requests the input CodegenContext to add itself to the references.

doGenCode walks down the expression tree to find Nondeterministic expressions and for every Nondeterministic expression does the following:

  1. Requests the input CodegenContext to add it to the references

  2. Requests the input CodegenContext to addPartitionInitializationStatement that is a Java code block as follows:

In the end, doGenCode generates a plain Java source code block that is one of the following code blocks per the nullable flag. doGenCode copies the input ExprCode with the code block added (as the code property).

Coalesce

admin阅读(1581)

Coalesce Expression

Coalesce is a Catalyst expression to represent coalesce standard function or SQL’s coalesce function in structured queries.

When created, Coalesce takes Catalyst expressions (as the children).

Caution
FIXME Describe FunctionArgumentConversion and Coalesce

Spark Optimizer uses NullPropagation logical optimization to remove null literals (in the children expressions). That could result in a static evaluation that gives null value if all children expressions are null literals.

Coalesce is also created when:

  • Analyzer is requested to commonNaturalJoinProcessing for FullOuter join type

  • RewriteDistinctAggregates logical optimization is requested to rewrite

  • ExtractEquiJoinKeys Scala extractor is requested to destructure a logical plan

  • ColumnStat is requested to statExprs

  • IfNull expression is created

  • Nvl expression is created

  • Whenever Cast expression is used in Catalyst expressions (e.g. Average, Sum)

CallMethodViaReflection

admin阅读(924)

CallMethodViaReflection Expression

CallMethodViaReflection is an expression that represents a static method call in Scala or Java using reflect and java_method functions.

Note
reflect and java_method functions are only supported in SQL and expression modes.
Table 1. CallMethodViaReflection’s DataType to JVM Types Mapping
DataType JVM Type

BooleanType

java.lang.Boolean / scala.Boolean

ByteType

java.lang.Byte / Byte

ShortType

java.lang.Short / Short

IntegerType

java.lang.Integer / Int

LongType

java.lang.Long / Long

FloatType

java.lang.Float / Float

DoubleType

java.lang.Double / Double

StringType

String

CallMethodViaReflection supports a fallback mode for expression code generation.

Table 2. CallMethodViaReflection’s Properties
Property Description

dataType

StringType

deterministic

Disabled (i.e. false)

nullable

Enabled (i.e. true)

prettyName

reflect

Note
CallMethodViaReflection is very similar to StaticInvoke expression.

关注公众号:spark技术分享

联系我们联系我们