关注 spark技术分享,
撸spark源码 玩spark最佳实践

ExpressionEncoder — Expression-Based Encoder

ExpressionEncoder — Expression-Based Encoder

ExpressionEncoder[T] is a generic Encoder of JVM objects of the type T to and from internal binary rows.

ExpressionEncoder[T] uses expressions for a serializer and a deserializer.

Note
ExpressionEncoder is the only supported implementation of Encoder which is explicitly enforced when Dataset is created (even though Dataset data structure accepts a bare Encoder[T]).

ExpressionEncoder uses serializer expressions to encode (aka serialize) a JVM object of type T to an internal binary row format (i.e. InternalRow).

Note
It is assumed that all serializer expressions contain at least one and the same BoundReference.

ExpressionEncoder uses a deserializer expression to decode (aka deserialize) a JVM object of type T from internal binary row format.

ExpressionEncoder is flat when serializer uses a single expression (which also means that the objects of a type T are not created using constructor parameters only like Product or DefinedByConstructorParams types).

Internally, a ExpressionEncoder creates a UnsafeProjection (for the input serializer), a InternalRow (of size 1), and a safe Projection (for the input deserializer). They are all internal lazy attributes of the encoder.

Table 1. ExpressionEncoder’s (Lazily-Initialized) Internal Properties
Property Description

constructProjection

Projection generated for the deserializer expression

Used exclusively when ExpressionEncoder is requested for a JVM object from a Spark SQL row (i.e. InternalRow).

extractProjection

UnsafeProjection generated for the serializer expressions

Used exclusively when ExpressionEncoder is requested for an encoded version of a JVM object as a Spark SQL row (i.e. InternalRow).

inputRow

GenericInternalRow (with the underlying storage array) of size 1 (i.e. it can only store a single JVM object of any type).

Used…​FIXME

Note
Encoders object contains the default ExpressionEncoders for Scala and Java primitive types, e.g. boolean, long, String, java.sql.Date, java.sql.Timestamp, Array[Byte].

Creating ExpressionEncoder — apply Method

Caution
FIXME

Creating ExpressionEncoder Instance

ExpressionEncoder takes the following when created:

  • Schema

  • Flag whether ExpressionEncoder is flat or not

  • Serializer expressions (to convert objects of type T to internal rows)

  • Deserializer expression (to convert internal rows to objects of type T)

  • Scala’s ClassTag for the JVM type T

Creating Deserialize Expression — ScalaReflection.deserializerFor Method

deserializerFor creates an expression to deserialize from internal binary row format to a Scala object of type T.

Internally, deserializerFor calls the recursive internal variant of deserializerFor with a single-element walked type path with - root class: "[clsName]"

Tip
Read up on Scala’s TypeTags in TypeTags and Manifests.
Note
deserializerFor is used exclusively when ExpressionEncoder is created for a Scala type T.

Recursive Internal deserializerFor Method

Table 2. JVM Types and Deserialize Expressions (in evaluation order)
JVM Type (Scala or Java) Deserialize Expressions

Option[T]

java.lang.Integer

java.lang.Long

java.lang.Double

java.lang.Float

java.lang.Short

java.lang.Byte

java.lang.Boolean

java.sql.Date

java.sql.Timestamp

java.lang.String

java.math.BigDecimal

scala.BigDecimal

java.math.BigInteger

scala.math.BigInt

Array[T]

Seq[T]

Map[K, V]

SQLUserDefinedType

User Defined Types (UDTs)

Product (including Tuple) or DefinedByConstructorParams

Creating Serialize Expression — ScalaReflection.serializerFor Method

serializerFor creates a CreateNamedStruct expression to serialize a Scala object of type T to internal binary row format.

Internally, serializerFor calls the recursive internal variant of serializerFor with a single-element walked type path with - root class: "[clsName]" and pattern match on the result expression.

Caution
FIXME the pattern match part
Tip
Read up on Scala’s TypeTags in TypeTags and Manifests.
Note
serializerFor is used exclusively when ExpressionEncoder is created for a Scala type T.

Recursive Internal serializerFor Method

serializerFor creates an expression for serializing an object of type T to an internal row.

Caution
FIXME

Encoding JVM Object to Internal Binary Row Format — toRow Method

toRow encodes (aka serializes) a JVM object t as an internal binary row.

Internally, toRow sets the only JVM object to be t in inputRow and converts the inputRow to a unsafe binary row (using extractProjection).

In case of any exception while serializing, toRow reports a RuntimeException:

Note

toRow is mostly used when SparkSession is requested for:

Decoding JVM Object From Internal Binary Row Format — fromRow Method

fromRow decodes (aka deserializes) a JVM object from a row InternalRow (with the required values only).

Internally, fromRow uses constructProjection with row and gets the 0th element of type ObjectType that is then cast to the output type T.

In case of any exception while deserializing, fromRow reports a RuntimeException:

Note

fromRow is used for:

  • Dataset operators, i.e. head, collect, collectAsList, toLocalIterator

  • Structured Streaming’s ForeachSink

Creating ExpressionEncoder For Tuple — tuple Method

tuple…​FIXME

Note
tuple is used when…​FIXME

resolveAndBind Method

resolveAndBind…​FIXME

Note

resolveAndBind is used when:

  • RowToUnsafeRowDataReaderFactory is requested to create a DataReader

  • InternalRowDataWriterFactory is requested to create a DataWriter

  • Dataset is requested for the deserializer expression (to convert internal rows to objects of type T)

  • TypedAggregateExpression is created

  • JdbcUtils is requested to resultSetToRows

  • Spark Structured Streaming’s FlatMapGroupsWithStateExec physical operator is requested for the state deserializer (i.e. stateDeserializer)

  • Spark Structured Streaming’s ForeachSink is requested to add a streaming batch (i.e. addBatch)

赞(0) 打赏
未经允许不得转载:spark技术分享 » ExpressionEncoder — Expression-Based Encoder
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏