GetArrayItem
GetArrayStructFields
Generator
Generator Contract — Expressions to Generate Zero Or More Rows (aka Lateral Views)
Generator is a contract for Catalyst expressions that can produce zero or more rows given a single input row.
|
Note
|
Generator corresponds to SQL’s LATERAL VIEW.
|
dataType in Generator is simply an ArrayType of elementSchema.
Generator supports Java code generation (aka whole-stage codegen) conditionally, i.e. only when a physical operator is not marked as CodegenFallback.
Generator uses terminate to inform that there are no more rows to process, clean up code, and additional rows can be made here.
|
1 2 3 4 5 |
terminate(): TraversableOnce[InternalRow] = Nil |
| Name | Description | ||||
|---|---|---|---|---|---|
|
Corresponds to |
|||||
|
Represents an unresolved generator. Created when
|
|||||
|
Used exclusively in the deprecated |
|
Note
|
You can only have one generator per select clause that is enforced by ExtractGenerator logical evaluation rule, e.g.
If you want to have more than one generator in a structured query you should use
|
Generator Contract
|
1 2 3 4 5 6 7 8 9 10 11 |
package org.apache.spark.sql.catalyst.expressions trait Generator extends Expression { // only required methods that have no implementation def elementSchema: StructType def eval(input: InternalRow): TraversableOnce[InternalRow] } |
| Method | Description |
|---|---|
|
Schema of the elements to be generated |
|
First
First Aggregate Function Expression
First is a DeclarativeAggregate function expression that is created when:
-
AstBuilderis requested to parse a FIRST statement -
first standard function is used
-
firstandfirst_valueSQL functions are used
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
val sqlText = "FIRST (organizationName IGNORE NULLS)" val e = spark.sessionState.sqlParser.parseExpression(sqlText) scala> :type e org.apache.spark.sql.catalyst.expressions.Expression import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression val aggExpr = e.asInstanceOf[AggregateExpression] import org.apache.spark.sql.catalyst.expressions.aggregate.First val f = aggExpr.aggregateFunction scala> println(f.simpleString) first('organizationName) ignore nulls |
When requested to evaluate (and return the final value), First simply returns a AttributeReference (with first name and the data type of the child expression).
|
Tip
|
Use first operator from the Catalyst DSL to create an First aggregate function expression, e.g. for testing or Spark SQL internals exploration.
|
Catalyst DSL — first Operator
|
1 2 3 4 5 |
first(e: Expression): Expression |
first creates a First expression and requests it to convert to a AggregateExpression.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import org.apache.spark.sql.catalyst.dsl.expressions._ val e = first('orgName) scala> println(e.numberedTreeString) 00 first('orgName, false) 01 +- first('orgName)() 02 :- 'orgName 03 +- false import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression val aggExpr = e.asInstanceOf[AggregateExpression] import org.apache.spark.sql.catalyst.expressions.aggregate.First val f = aggExpr.aggregateFunction scala> println(f.simpleString) first('orgName)() |
Creating First Instance
First takes the following when created:
-
Child expression
-
ignoreNullsExprflag expression
ExplodeBase Contract
ExplodeBase Base Generator Expression
ExplodeBase is the base class for Explode and PosExplode generator expressions.
ExplodeBase is a unary expression and Generator with CodegenFallback.
Explode Generator Unary Expression
Explode is a unary expression that produces a sequence of records for each value in the array or map.
Explode is a result of executing explode function (in SQL and functions)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
scala> sql("SELECT explode(array(10,20))").explain == Physical Plan == Generate explode([10,20]), false, false, [col#68] +- Scan OneRowRelation[] scala> sql("SELECT explode(array(10,20))").queryExecution.optimizedPlan.expressions(0) res18: org.apache.spark.sql.catalyst.expressions.Expression = explode([10,20]) val arrayDF = Seq(Array(0,1)).toDF("array") scala> arrayDF.withColumn("num", explode('array)).explain == Physical Plan == Generate explode(array#93), true, false, [array#93, num#102] +- LocalTableScan [array#93] |
ExpectsInputTypes Contract
Exists
Exists — Correlated Predicate Subquery Expression
Exists is a SubqueryExpression and a predicate expression (i.e. the result data type is always boolean).
Exists is created when:
-
ResolveSubqueryis requested to resolveSubQueries -
PullupCorrelatedPredicatesis requested to rewriteSubQueries -
AstBuilderis requested to visitExists (in SQL statements)
Exists cannot be evaluated, i.e. produce a value given an internal row.
|
1 2 3 4 5 |
Cannot evaluate expression: [this] |
Exists is never nullable.
Exists uses the following text representation:
|
1 2 3 4 5 |
exists#[exprId] [conditionString] |
When requested for a canonicalized version, Exists creates a new instance with…FIXME
Creating Exists Instance
Exists takes the following when created:
-
Subquery logical plan
-
Child expressions
ExecSubqueryExpression
ExecSubqueryExpression Contract — Catalyst Expressions with SubqueryExec Physical Operators
ExecSubqueryExpression is the contract for Catalyst expressions that contain a physical plan with SubqueryExec physical operator (i.e. PlanExpression[SubqueryExec]).
|
1 2 3 4 5 6 7 8 9 |
package org.apache.spark.sql.execution abstract class ExecSubqueryExpression extends PlanExpression[SubqueryExec] { def updateResult(): Unit } |
| Method | Description |
|---|---|
|
|
Used exclusively when a physical operator is requested to waitForSubqueries (when executed as part of Physical Operator Execution Pipeline). |
| ExecSubqueryExpression | Description |
|---|---|
DeclarativeAggregate Contract — Unevaluable Aggregate Function Expressions
DeclarativeAggregate Contract — Unevaluable Aggregate Function Expressions
DeclarativeAggregate is an extension of the AggregateFunction Contract for aggregate function expressions that are unevaluable and use expressions for evaluation.
|
Note
|
An unevaluable expression cannot be evaluated to produce a value (neither in interpreted nor code-generated expression evaluations) and has to be resolved (replaced) to some other expressions or logical operators at analysis or optimization phases or they fail analysis. |
| Property | Description | ||
|---|---|---|---|
|
|
The expression that returns the final value for the aggregate function Used when:
|
||
|
|
|
||
|
|
|
||
|
|
|
spark技术分享