GetArrayItem
GetArrayStructFields
Generator
Generator Contract — Expressions to Generate Zero Or More Rows (aka Lateral Views)
Generator
is a contract for Catalyst expressions that can produce zero or more rows given a single input row.
Note
|
Generator corresponds to SQL’s LATERAL VIEW.
|
dataType
in Generator
is simply an ArrayType of elementSchema.
Generator
supports Java code generation (aka whole-stage codegen) conditionally, i.e. only when a physical operator is not marked as CodegenFallback.
Generator
uses terminate
to inform that there are no more rows to process, clean up code, and additional rows can be made here.
1 2 3 4 5 |
terminate(): TraversableOnce[InternalRow] = Nil |
Name | Description | ||||
---|---|---|---|---|---|
Corresponds to |
|||||
Represents an unresolved generator. Created when
|
|||||
Used exclusively in the deprecated |
Note
|
You can only have one generator per select clause that is enforced by ExtractGenerator logical evaluation rule, e.g.
If you want to have more than one generator in a structured query you should use
|
Generator Contract
1 2 3 4 5 6 7 8 9 10 11 |
package org.apache.spark.sql.catalyst.expressions trait Generator extends Expression { // only required methods that have no implementation def elementSchema: StructType def eval(input: InternalRow): TraversableOnce[InternalRow] } |
Method | Description |
---|---|
Schema of the elements to be generated |
|
First
First Aggregate Function Expression
First
is a DeclarativeAggregate function expression that is created when:
-
AstBuilder
is requested to parse a FIRST statement -
first standard function is used
-
first
andfirst_value
SQL functions are used
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
val sqlText = "FIRST (organizationName IGNORE NULLS)" val e = spark.sessionState.sqlParser.parseExpression(sqlText) scala> :type e org.apache.spark.sql.catalyst.expressions.Expression import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression val aggExpr = e.asInstanceOf[AggregateExpression] import org.apache.spark.sql.catalyst.expressions.aggregate.First val f = aggExpr.aggregateFunction scala> println(f.simpleString) first('organizationName) ignore nulls |
When requested to evaluate (and return the final value), First
simply returns a AttributeReference (with first
name and the data type of the child expression).
Tip
|
Use first operator from the Catalyst DSL to create an First aggregate function expression, e.g. for testing or Spark SQL internals exploration.
|
Catalyst DSL — first
Operator
1 2 3 4 5 |
first(e: Expression): Expression |
first
creates a First
expression and requests it to convert to a AggregateExpression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import org.apache.spark.sql.catalyst.dsl.expressions._ val e = first('orgName) scala> println(e.numberedTreeString) 00 first('orgName, false) 01 +- first('orgName)() 02 :- 'orgName 03 +- false import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression val aggExpr = e.asInstanceOf[AggregateExpression] import org.apache.spark.sql.catalyst.expressions.aggregate.First val f = aggExpr.aggregateFunction scala> println(f.simpleString) first('orgName)() |
Creating First Instance
First
takes the following when created:
-
Child expression
-
ignoreNullsExpr
flag expression
ExplodeBase Contract
ExplodeBase Base Generator Expression
ExplodeBase
is the base class for Explode and PosExplode generator expressions.
ExplodeBase
is a unary expression and Generator with CodegenFallback.
Explode Generator Unary Expression
Explode
is a unary expression that produces a sequence of records for each value in the array or map.
Explode
is a result of executing explode
function (in SQL and functions)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
scala> sql("SELECT explode(array(10,20))").explain == Physical Plan == Generate explode([10,20]), false, false, [col#68] +- Scan OneRowRelation[] scala> sql("SELECT explode(array(10,20))").queryExecution.optimizedPlan.expressions(0) res18: org.apache.spark.sql.catalyst.expressions.Expression = explode([10,20]) val arrayDF = Seq(Array(0,1)).toDF("array") scala> arrayDF.withColumn("num", explode('array)).explain == Physical Plan == Generate explode(array#93), true, false, [array#93, num#102] +- LocalTableScan [array#93] |
ExpectsInputTypes Contract
Exists
Exists — Correlated Predicate Subquery Expression
Exists
is a SubqueryExpression and a predicate expression (i.e. the result data type is always boolean).
Exists
is created when:
-
ResolveSubquery
is requested to resolveSubQueries -
PullupCorrelatedPredicates
is requested to rewriteSubQueries -
AstBuilder
is requested to visitExists (in SQL statements)
Exists
cannot be evaluated, i.e. produce a value given an internal row.
1 2 3 4 5 |
Cannot evaluate expression: [this] |
Exists
is never nullable.
Exists
uses the following text representation:
1 2 3 4 5 |
exists#[exprId] [conditionString] |
When requested for a canonicalized version, Exists
creates a new instance with…FIXME
Creating Exists Instance
Exists
takes the following when created:
-
Subquery logical plan
-
Child expressions
ExecSubqueryExpression
ExecSubqueryExpression Contract — Catalyst Expressions with SubqueryExec Physical Operators
ExecSubqueryExpression
is the contract for Catalyst expressions that contain a physical plan with SubqueryExec physical operator (i.e. PlanExpression[SubqueryExec]
).
1 2 3 4 5 6 7 8 9 |
package org.apache.spark.sql.execution abstract class ExecSubqueryExpression extends PlanExpression[SubqueryExec] { def updateResult(): Unit } |
Method | Description |
---|---|
|
Used exclusively when a physical operator is requested to waitForSubqueries (when executed as part of Physical Operator Execution Pipeline). |
ExecSubqueryExpression | Description |
---|---|
DeclarativeAggregate Contract — Unevaluable Aggregate Function Expressions
DeclarativeAggregate Contract — Unevaluable Aggregate Function Expressions
DeclarativeAggregate
is an extension of the AggregateFunction Contract for aggregate function expressions that are unevaluable and use expressions for evaluation.
Note
|
An unevaluable expression cannot be evaluated to produce a value (neither in interpreted nor code-generated expression evaluations) and has to be resolved (replaced) to some other expressions or logical operators at analysis or optimization phases or they fail analysis. |
Property | Description | ||
---|---|---|---|
|
The expression that returns the final value for the aggregate function Used when:
|
||
|
|
||
|
|
||
|
|