关注 spark技术分享,
撸spark源码 玩spark最佳实践

RowEncoder — Encoder for DataFrames

RowEncoder — Encoder for DataFrames

RowEncoder is part of the Encoder framework and acts as the encoder for DataFrames, i.e. Dataset[Row] — Datasets of Rows.

Note
DataFrame type is a mere type alias for Dataset[Row] that expects a Encoder[Row] available in scope which is indeed RowEncoder itself.

RowEncoder is an object in Scala with apply and other factory methods.

RowEncoder can create ExpressionEncoder[Row] from a schema (using apply method).

RowEncoder object belongs to org.apache.spark.sql.catalyst.encoders package.

Creating ExpressionEncoder For Row Type — apply method

apply builds ExpressionEncoder of Row, i.e. ExpressionEncoder[Row], from the input StructType (as schema).

Internally, apply creates a BoundReference for the Row type and returns a ExpressionEncoder[Row] for the input schema, a CreateNamedStruct serializer (using serializerFor internal method), a deserializer for the schema, and the Row type.

serializerFor Internal Method

serializerFor creates an Expression that is assumed to be CreateNamedStruct.

serializerFor takes the input inputType and:

  1. Returns the input inputObject as is for native types, i.e. NullType, BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType, BinaryType, CalendarIntervalType.

    Caution
    FIXME What does being native type mean?
  2. For UserDefinedTypes, it takes the UDT class from the SQLUserDefinedType annotation or UDTRegistration object and returns an expression with Invoke to call serialize method on a NewInstance of the UDT class.

  3. For TimestampType, it returns an expression with a StaticInvoke to call fromJavaTimestamp on DateTimeUtils class.

  4. …​FIXME

Caution
FIXME Describe me.
赞(0) 打赏
未经允许不得转载:spark技术分享 » RowEncoder — Encoder for DataFrames
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏