RowEncoder — Encoder for DataFrames
RowEncoder
is part of the Encoder framework and acts as the encoder for DataFrames, i.e. Dataset[Row]
— Datasets of Rows.
Note
|
DataFrame type is a mere type alias for Dataset[Row] that expects a Encoder[Row] available in scope which is indeed RowEncoder itself.
|
RowEncoder
is an object
in Scala with apply and other factory methods.
RowEncoder
can create ExpressionEncoder[Row]
from a schema (using apply method).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import org.apache.spark.sql.types._ val schema = StructType( StructField("id", LongType, nullable = false) :: StructField("name", StringType, nullable = false) :: Nil) import org.apache.spark.sql.catalyst.encoders.RowEncoder scala> val encoder = RowEncoder(schema) encoder: org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[org.apache.spark.sql.Row] = class[id[0]: bigint, name[0]: string] // RowEncoder is never flat scala> encoder.flat res0: Boolean = false |
RowEncoder
object belongs to org.apache.spark.sql.catalyst.encoders
package.
Creating ExpressionEncoder For Row Type — apply
method
1 2 3 4 5 |
apply(schema: StructType): ExpressionEncoder[Row] |
apply
builds ExpressionEncoder of Row, i.e. ExpressionEncoder[Row]
, from the input StructType (as schema
).
Internally, apply
creates a BoundReference for the Row type and returns a ExpressionEncoder[Row]
for the input schema
, a CreateNamedStruct
serializer (using serializerFor
internal method), a deserializer for the schema, and the Row
type.
serializerFor
Internal Method
1 2 3 4 5 |
serializerFor(inputObject: Expression, inputType: DataType): Expression |
serializerFor
creates an Expression
that is assumed to be CreateNamedStruct
.
serializerFor
takes the input inputType
and:
-
Returns the input
inputObject
as is for native types, i.e.NullType
,BooleanType
,ByteType
,ShortType
,IntegerType
,LongType
,FloatType
,DoubleType
,BinaryType
,CalendarIntervalType
.CautionFIXME What does being native type mean? -
For
UserDefinedType
s, it takes the UDT class from theSQLUserDefinedType
annotation orUDTRegistration
object and returns an expression withInvoke
to callserialize
method on aNewInstance
of the UDT class. -
For TimestampType, it returns an expression with a StaticInvoke to call
fromJavaTimestamp
onDateTimeUtils
class. -
…FIXME
Caution
|
FIXME Describe me. |