关注 spark技术分享,
撸spark源码 玩spark最佳实践

Catalyst DSL — Implicit Conversions for Catalyst Data Structures

Catalyst DSL — Implicit Conversions for Catalyst Data Structures

Catalyst DSL is a collection of Scala implicit conversions for constructing Catalyst data structures, i.e. expressions and logical plans, more easily.

The goal of Catalyst DSL is to make working with Spark SQL’s building blocks easier (e.g. for testing or Spark SQL internals exploration).

Table 1. Catalyst DSL’s Implicit Conversions
Name Description

ExpressionConversions

Creates expressions

  • Literals

  • UnresolvedAttribute and UnresolvedReference

  • …​

ImplicitOperators

Adds operators to expressions for complex expressions

plans

Creates logical plans

Catalyst DSL is part of org.apache.spark.sql.catalyst.dsl package object.

Important

Some implicit conversions from the Catalyst DSL interfere with the implicits conversions from SQLImplicits that are imported automatically in spark-shell (through spark.implicits._).

Use sbt console with Spark libraries defined (in build.sbt) instead.


You can also disable an implicit conversion using a trick described in How can an implicit be unimported from the Scala repl?

ImplicitOperators Implicit Conversions

Operators for expressions, i.e. in.

ExpressionConversions Implicit Conversions

ExpressionConversions implicit conversions add ImplicitOperators operators to Catalyst expressions.

Type Conversions to Literal Expressions

ExpressionConversions adds conversions of Scala native types (e.g. Boolean, Long, String, Date, Timestamp) and Spark SQL types (i.e. Decimal) to Literal expressions.

Converting Symbols to UnresolvedAttribute and AttributeReference Expressions

ExpressionConversions adds conversions of Scala’s Symbol to UnresolvedAttribute and AttributeReference expressions.

Converting $-Prefixed String Literals to UnresolvedAttribute Expressions

ExpressionConversions adds conversions of $"col name" to an UnresolvedAttribute expression.

Adding Aggregate And Non-Aggregate Functions to Expressions

ExpressionConversions adds the aggregate and non-aggregate functions to Catalyst expressions (e.g. sum, count, upper, star, callFunction, windowSpec, windowExpr)

Creating UnresolvedFunction Expressions — function and distinctFunction Methods

ExpressionConversions allows creating UnresolvedFunction expressions with function and distinctFunction operators.

Creating AttributeReference Expressions With nullability On or Off — notNull and canBeNull Methods

ExpressionConversions adds canBeNull and notNull operators to create a AttributeReference with nullability turned on or off, respectively.

Creating BoundReference — at Method

ExpressionConversions adds at method to AttributeReferences to create BoundReference expressions.

plans Implicit Conversions for Logical Plans

Creating UnresolvedHint Logical Operator — hint Method

plans adds hint method to create a UnresolvedHint logical operator.

Creating Join Logical Operator — join Method

join creates a Join logical operator.

Creating UnresolvedRelation Logical Operator — table Method

table creates a UnresolvedRelation logical operator.

DslLogicalPlan Implicit Class

DslLogicalPlan implicit class is part of plans implicit conversions with extension methods (of logical operators) to build entire logical plans.

Analyzing Logical Plan — analyze Method

analyze resolves attribute references.

analyze method is part of DslLogicalPlan implicit class.

Internally, analyze uses EliminateSubqueryAliases logical optimization and SimpleAnalyzer logical analyzer.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Catalyst DSL — Implicit Conversions for Catalyst Data Structures
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏