关注 spark技术分享,
撸spark源码 玩spark最佳实践

AstBuilder — ANTLR-based SQL Parser

AstBuilder — ANTLR-based SQL Parser

AstBuilder converts SQL statements into Spark SQL’s relational entities (i.e. data types, Catalyst expressions, logical plans or TableIdentifiers) using visit callback methods.

AstBuilder is the AST builder of AbstractSqlParser (i.e. the base SQL parsing infrastructure in Spark SQL).

Tip

Spark SQL supports SQL statements as described in SqlBase.g4. Using the file can tell you (almost) exactly what Spark SQL supports at any given time.

“Almost” being that although the grammar accepts a SQL statement it can be reported as not allowed by AstBuilder, e.g.

AstBuilder is a ANTLR AbstractParseTreeVisitor (as SqlBaseBaseVisitor) that is generated from SqlBase.g4 ANTLR grammar for Spark SQL.

Note

SqlBaseBaseVisitor is a ANTLR-specific base class that is auto-generated at build time from a ANTLR grammar in SqlBase.g4.

SqlBaseBaseVisitor is an ANTLR AbstractParseTreeVisitor.

Table 1. AstBuilder’s Visit Callback Methods
Callback Method ANTLR rule / labeled alternative Spark SQL Entity

visitAliasedQuery

visitColumnReference

visitDereference

visitExists

#exists labeled alternative

Exists expression

visitExplain

explain rule

Note

Can be a OneRowRelation for an EXPLAIN for an unexplainable DescribeTableCommand logical command as created from DESCRIBE TABLE SQL statement.

visitFirst

#first labeled alternative

First aggregate function expression

visitFromClause

fromClause

Supports multiple comma-separated relations (that all together build a condition-less INNER JOIN) with optional LATERAL VIEW.

A relation can be one of the following or a combination thereof:

  • Table identifier

  • Inline table using VALUES exprs AS tableIdent

  • Table-valued function (currently only range is supported)

visitFunctionCall

functionCall labeled alternative

Tip
See the function examples below.

visitInlineTable

inlineTable rule

UnresolvedInlineTable unary logical operator (as the child of SubqueryAlias for tableAlias)

expression can be as follows:

tableAlias can be specified explicitly or defaults to colN for every column (starting from 1 for N).

visitInsertIntoTable

#insertIntoTable labeled alternative

InsertIntoTable (indirectly)

A 3-element tuple with a TableIdentifier, optional partition keys and the exists flag disabled

Note
insertIntoTable is part of insertInto that is in turn used only as a helper labeled alternative in singleInsertQuery and multiInsertQueryBody rules.

visitInsertOverwriteTable

#insertOverwriteTable labeled alternative

InsertIntoTable (indirectly)

A 3-element tuple with a TableIdentifier, optional partition keys and the exists flag

In a way, visitInsertOverwriteTable is simply a more general version of the visitInsertIntoTable with the exists flag on or off per IF NOT EXISTS used or not. The main difference is that dynamic partitions are used with no IF NOT EXISTS.

Note
insertOverwriteTable is part of insertInto that is in turn used only as a helper labeled alternative in singleInsertQuery and multiInsertQueryBody rules.

visitMultiInsertQuery

multiInsertQueryBody

A logical operator with a InsertIntoTable (and UnresolvedRelation leaf operator)

visitNamedExpression

namedExpression

  • Alias (for a single alias)

  • MultiAlias (for a parenthesis enclosed alias list

  • a bare Expression

visitNamedQuery

SubqueryAlias

visitQuerySpecification

querySpecification

OneRowRelation or LogicalPlan

Note

visitQuerySpecification creates a OneRowRelation for a SELECT without a FROM clause.

visitPredicated

predicated

Expression

visitRelation

relation

LogicalPlan for a FROM clause.

visitRowConstructor

visitSingleDataType

singleDataType

DataType

visitSingleExpression

singleExpression

Expression

Takes the named expression and relays to visitNamedExpression

visitSingleInsertQuery

#singleInsertQuery labeled alternative

A logical operator with a InsertIntoTable

visitSortItem

sortItem

SortOrder unevaluable unary expression

DESC)? (NULLS nullOrder=(LAST

FIRST))?
;

ORDER BY order+=sortItem (‘,’ order+=sortItem)*
SORT BY sort+=sortItem (‘,’ sort+=sortItem)*

(ORDER

SORT) BY sortItem (‘,’ sortItem)*)?
`

visitSingleStatement

singleStatement

LogicalPlan from a single statement

Note
A single statement can be quite involved.

visitSingleTableIdentifier

singleTableIdentifier

TableIdentifier

visitStar

#star labeled alternative

UnresolvedStar

visitStruct

visitSubqueryExpression

#subqueryExpression labeled alternative

ScalarSubquery

visitWindowDef

windowDef labeled alternative

Table 2. AstBuilder’s Parsing Handlers
Parsing Handler LogicalPlan Added

withAggregation

  • GroupingSets for GROUP BY … GROUPING SETS (…)

  • Aggregate for GROUP BY … (WITH CUBE | WITH ROLLUP)?

withGenerate

Generate with a UnresolvedGenerator and join flag turned on for LATERAL VIEW (in SELECT or FROM clauses).

withHints

Hint for /*+ hint */ in SELECT queries.

Tip
Note + (plus) between /* and */

hint is of the format name or name (param1, param2, …​).

withInsertInto

withJoinRelations

Join for a FROM clause and relation alone.

The following join types are supported:

  • INNER (default)

  • CROSS

  • LEFT (with optional OUTER)

  • LEFT SEMI

  • RIGHT (with optional OUTER)

  • FULL (with optional OUTER)

  • ANTI (optionally prefixed with LEFT)

The following join criteria are supported:

  • ON booleanExpression

  • USING '(' identifier (',' identifier)* ')'

Joins can be NATURAL (with no join criteria).

withQueryResultClauses

withQuerySpecification

Adds a query specification to a logical operator.

For transform SELECT (with TRANSFORM, MAP or REDUCE qualifiers), withQuerySpecification does…​FIXME


For regular SELECT (no TRANSFORM, MAP or REDUCE qualifiers), withQuerySpecification adds (in that order):

  1. Generate unary logical operators (if used in the parsed SQL text)

  2. Filter unary logical plan (if used in the parsed SQL text)

  3. GroupingSets or Aggregate unary logical operators (if used in the parsed SQL text)

  4. Project and/or Filter unary logical operators

  5. WithWindowDefinition unary logical operator (if used in the parsed SQL text)

  6. UnresolvedHint unary logical operator (if used in the parsed SQL text)

withPredicate

  • NOT? IN '(' query ')' gives an In predicate expression with a ListQuery subquery expression

  • NOT? IN '(' expression (',' expression)* ')' gives an In predicate expression

withWindows

WithWindowDefinition for window aggregates (given WINDOW definitions).

Used for withQueryResultClauses and withQuerySpecification with windows definition.

Tip
Consult windows, namedWindow, windowSpec, windowFrame, and frameBound (with windowRef and windowDef) ANTLR parsing rules for Spark SQL in SqlBase.g4.
Note
AstBuilder belongs to org.apache.spark.sql.catalyst.parser package.

Function Examples

The examples are handled by visitFunctionCall.

aliasPlan Internal Method

aliasPlan…​FIXME

Note
aliasPlan is used when…​FIXME

mayApplyAliasPlan Internal Method

mayApplyAliasPlan…​FIXME

Note
mayApplyAliasPlan is used when…​FIXME
赞(0) 打赏
未经允许不得转载:spark技术分享 » AstBuilder — ANTLR-based SQL Parser
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏