关注 spark技术分享,
撸spark源码 玩spark最佳实践

MonotonicallyIncreasingID

admin阅读(1652)

MonotonicallyIncreasingID Nondeterministic Leaf Expression

MonotonicallyIncreasingID is a non-deterministic leaf expression that is the internal representation of the monotonically_increasing_id standard and SQL functions.

As a Nondeterministic expression, MonotonicallyIncreasingID requires explicit initialization (with the current partition index) before evaluating a value.

MonotonicallyIncreasingID uses LongType as the data type of the result of evaluating itself.

MonotonicallyIncreasingID is never nullable.

MonotonicallyIncreasingID uses monotonically_increasing_id for the user-facing name.

MonotonicallyIncreasingID uses monotonically_increasing_id() for the SQL representation.

MonotonicallyIncreasingID is created when monotonically_increasing_id standard function is used in a structured query.

MonotonicallyIncreasingID is registered as monotonically_increasing_id SQL function.

MonotonicallyIncreasingID takes no input parameters when created.

Table 1. MonotonicallyIncreasingID’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

count

Number of evalInternal calls, i.e. the number of rows for which MonotonicallyIncreasingID was evaluated

Initialized when MonotonicallyIncreasingID is requested to initialize and used to evaluate a value.

partitionMask

Current partition index shifted 33 bits left

Initialized when MonotonicallyIncreasingID is requested to initialize and used to evaluate a value.

Generating Java Source Code (ExprCode) For Code-Generated Expression Evaluation — doGenCode Method

Note
doGenCode is part of Expression Contract to generate a Java source code (ExprCode) for code-generated expression evaluation.

doGenCode requests the CodegenContext to add a mutable state as count name and long Java type.

doGenCode requests the CodegenContext to add an immutable state (unless exists already) as partitionMask name and long Java type.

doGenCode requests the CodegenContext to addPartitionInitializationStatement with [countTerm] = 0L; statement.

doGenCode requests the CodegenContext to addPartitionInitializationStatement with [partitionMaskTerm] = ((long) partitionIndex) << 33; statement.

In the end, doGenCode returns the input ExprCode with the code as follows and isNull property disabled (false):

Initializing Nondeterministic Expression — initializeInternal Method

Note
initializeInternal is part of Nondeterministic Contract to initialize a Nondeterministic expression.

initializeInternal simply sets the count to 0 and the partitionMask to partitionIndex.toLong << 33.

Evaluating Nondeterministic Expression — evalInternal Method

Note
evalInternal is part of Nondeterministic Contract to evaluate the value of a Nondeterministic expression.

evalInternal remembers the current value of the count and increments it.

In the end, evalInternal returns the sum of the current value of the partitionMask and the remembered value of the count.

Literal

admin阅读(1605)

Literal Leaf Expression

Literal is a leaf expression that is created to represent a Scala value of a specific type.

Literal is created when…​MEFIXME

Table 1. Literal’s Properties
Property Description

foldable

Enabled (i.e. true)

nullable

Enabled when value is null

Creating Literal Instance — create Object Method

create uses CatalystTypeConverters helper object to convert the input v Scala value to a Catalyst rows or types and creates a Literal (with the Catalyst value and the input DataType).

Note
create is used when…​FIXME

Creating Literal Instance

Literal takes the following when created:

Generating Java Source Code (ExprCode) For Code-Generated Expression Evaluation — doGenCode Method

Note
doGenCode is part of Expression Contract to generate a Java source code (ExprCode) for code-generated expression evaluation.

doGenCode…​FIXME

ListQuery

admin阅读(1648)

ListQuery Subquery Expression

ListQuery is a SubqueryExpression that represents SQL’s IN predicate with a subquery, e.g. NOT? IN '(' query ')'.

ListQuery cannot be evaluated and produce a value given an internal row.

ListQuery is resolved when:

Creating ListQuery Instance

ListQuery takes the following when created:

JsonTuple

admin阅读(1611)

JsonTuple Generator Expression

JsonTuple is…​FIXME

JsonToStructs

admin阅读(1827)

JsonToStructs Unary Expression

JsonToStructs is a unary expression with timezone support and CodegenFallback.

JsonToStructs is created to represent from_json function.

JsonToStructs is a ExpectsInputTypes expression.

Note

JsonToStructs uses JacksonParser in FAILFAST mode that simply fails early when a corrupted/malformed record is found (and hence does not support columnNameOfCorruptRecord JSON option).

Table 1. JsonToStructs’s Properties
Property Description

converter

Function that converts Seq[InternalRow] into…​FIXME

nullable

Enabled (i.e. true)

parser

JacksonParser with rowSchema and JSON options

Note
JSON options are made up of the input options with mode option as FAILFAST and the input time zone as the default time zone.

rowSchema

StructType that…​FIXME

  • schema when of type StructType

  • StructType of the elements in schema when of type ArrayType

Creating JsonToStructs Instance

JsonToStructs takes the following when created:

JsonToStructs initializes the internal registries and counters.

Parsing Table Schema for String Literals — validateSchemaLiteral Method

validateSchemaLiteral requests CatalystSqlParser to parseTableSchema for Literal of StringType.

For any other non-StringType types, validateSchemaLiteral reports a AnalysisException:

InSubquery

admin阅读(1516)

InSubquery Expression

InSubquery is a ExecSubqueryExpression that…​FIXME

InSubquery is created when…​FIXME

updateResult Method

Note
updateResult is part of ExecSubqueryExpression Contract to…​FIXME.

updateResult…​FIXME

Creating InSubquery Instance

InSubquery takes the following when created:

  • Child expression

  • SubqueryExec physical operator

  • Expression ID (as ExprId)

  • result array (default: null)

  • updated flag (default: false)

Inline

admin阅读(1618)

Inline Generator Expression

Inline is created by inline and inline_outer standard functions.

ImperativeAggregate

admin阅读(1816)

ImperativeAggregate — Contract for Aggregate Function Expressions with Imperative Methods

ImperativeAggregate is the contract for aggregate functions that are expressed in terms of imperative initialize, update, and merge methods (that operate on Row-based aggregation buffers).

ImperativeAggregate is a Catalyst expression with CodegenFallback.

Table 1. ImperativeAggregate’s Direct Implementations
Name Description

HyperLogLogPlusPlus

PivotFirst

ScalaUDAF

TypedImperativeAggregate

ImperativeAggregate Contract

Table 2. ImperativeAggregate Contract
Method Description

initialize

Used when:

inputAggBufferOffset

merge

Used when:

mutableAggBufferOffset

update

Used when:

withNewInputAggBufferOffset

withNewMutableAggBufferOffset

GetStructField

admin阅读(1407)

GetStructField Unary Expression

GetStructField is a unary expression that…​FIXME

GetStructField is created when…​FIXME

Creating GetStructField Instance

GetStructField takes the following when created:

关注公众号:spark技术分享

联系我们联系我们