关注 spark技术分享,
撸spark源码 玩spark最佳实践

GetMapValue

admin阅读(710)

GetMapValue

GetMapValue is…​FIXME

GetArrayItem

admin阅读(3075)

GetArrayItem

GetArrayItem is…​FIXME

GetArrayStructFields

admin阅读(3260)

GetArrayStructFields

GetArrayStructFields is…​FIXME

Generator

admin阅读(4617)

Generator Contract — Expressions to Generate Zero Or More Rows (aka Lateral Views)

Generator is a contract for Catalyst expressions that can produce zero or more rows given a single input row.

Note
Generator corresponds to SQL’s LATERAL VIEW.

dataType in Generator is simply an ArrayType of elementSchema.

Generator is not foldable and not nullable by default.

Generator supports Java code generation (aka whole-stage codegen) conditionally, i.e. only when a physical operator is not marked as CodegenFallback.

Generator uses terminate to inform that there are no more rows to process, clean up code, and additional rows can be made here.

Table 1. Generators
Name Description

CollectionGenerator

ExplodeBase

Explode

GeneratorOuter

HiveGenericUDTF

Inline

Corresponds to inline and inline_outer functions.

JsonTuple

PosExplode

Stack

UnresolvedGenerator

Represents an unresolved generator.

Created when AstBuilder creates Generate unary logical operator for LATERAL VIEW that corresponds to the following:

Note
UnresolvedGenerator is resolved to Generator by ResolveFunctions logical evaluation rule.

UserDefinedGenerator

Used exclusively in the deprecated explode operator

Note

You can only have one generator per select clause that is enforced by ExtractGenerator logical evaluation rule, e.g.

If you want to have more than one generator in a structured query you should use LATERAL VIEW which is supported in SQL only, e.g.

Generator Contract

Table 2. (Subset of) Generator Contract
Method Description

elementSchema

Schema of the elements to be generated

eval

First

admin阅读(1499)

First Aggregate Function Expression

First is a DeclarativeAggregate function expression that is created when:

When requested to evaluate (and return the final value), First simply returns a AttributeReference (with first name and the data type of the child expression).

Tip
Use first operator from the Catalyst DSL to create an First aggregate function expression, e.g. for testing or Spark SQL internals exploration.

Catalyst DSL — first Operator

first creates a First expression and requests it to convert to a AggregateExpression.

Creating First Instance

First takes the following when created:

ExplodeBase Contract

admin阅读(1343)

ExplodeBase Base Generator Expression

ExplodeBase is the base class for Explode and PosExplode generator expressions.

ExplodeBase is a unary expression and Generator with CodegenFallback.

Explode Generator Unary Expression

Explode is a unary expression that produces a sequence of records for each value in the array or map.

Explode is a result of executing explode function (in SQL and functions)

PosExplode

Caution
FIXME

Exists

admin阅读(1488)

Exists — Correlated Predicate Subquery Expression

Exists is a SubqueryExpression and a predicate expression (i.e. the result data type is always boolean).

Exists is created when:

  1. ResolveSubquery is requested to resolveSubQueries

  2. PullupCorrelatedPredicates is requested to rewriteSubQueries

  3. AstBuilder is requested to visitExists (in SQL statements)

Exists cannot be evaluated, i.e. produce a value given an internal row.


When requested to evaluate or doGenCode, Exists simply reports a UnsupportedOperationException.

Exists is never nullable.

Exists uses the following text representation:

When requested for a canonicalized version, Exists creates a new instance with…​FIXME

Creating Exists Instance

Exists takes the following when created:

ExecSubqueryExpression

admin阅读(1461)

ExecSubqueryExpression Contract — Catalyst Expressions with SubqueryExec Physical Operators

ExecSubqueryExpression is the contract for Catalyst expressions that contain a physical plan with SubqueryExec physical operator (i.e. PlanExpression[SubqueryExec]).

Table 1. ExecSubqueryExpression Contract
Method Description

updateResult

Used exclusively when a physical operator is requested to waitForSubqueries (when executed as part of Physical Operator Execution Pipeline).

Table 2. ExecSubqueryExpressions
ExecSubqueryExpression Description

InSubquery

ScalarSubquery

DeclarativeAggregate Contract — Unevaluable Aggregate Function Expressions

admin阅读(1601)

DeclarativeAggregate Contract — Unevaluable Aggregate Function Expressions

DeclarativeAggregate is an extension of the AggregateFunction Contract for aggregate function expressions that are unevaluable and use expressions for evaluation.

Note
An unevaluable expression cannot be evaluated to produce a value (neither in interpreted nor code-generated expression evaluations) and has to be resolved (replaced) to some other expressions or logical operators at analysis or optimization phases or they fail analysis.
Table 1. DeclarativeAggregate Contract
Property Description

evaluateExpression

The expression that returns the final value for the aggregate function

Used when:

initialValues

mergeExpressions

updateExpressions

Table 2. DeclarativeAggregates (Direct Implementations)
DeclarativeAggregate Description

AggregateWindowFunction

Contract for declarative window aggregate function expressions

Average

CentralMomentAgg

Corr

Count

Covariance

First

Last

Max

Min

SimpleTypedAggregateExpression

Sum

关注公众号:spark技术分享

联系我们联系我们