RowEncoder — Encoder for DataFrames
RowEncoder is part of the Encoder framework and acts as the encoder for DataFrames, i.e. Dataset[Row] — Datasets of Rows.
|
Note
|
DataFrame type is a mere type alias for Dataset[Row] that expects a Encoder[Row] available in scope which is indeed RowEncoder itself.
|
RowEncoder is an object in Scala with apply and other factory methods.
RowEncoder can create ExpressionEncoder[Row] from a schema (using apply method).
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import org.apache.spark.sql.types._ val schema = StructType( StructField("id", LongType, nullable = false) :: StructField("name", StringType, nullable = false) :: Nil) import org.apache.spark.sql.catalyst.encoders.RowEncoder scala> val encoder = RowEncoder(schema) encoder: org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[org.apache.spark.sql.Row] = class[id[0]: bigint, name[0]: string] // RowEncoder is never flat scala> encoder.flat res0: Boolean = false |
RowEncoder object belongs to org.apache.spark.sql.catalyst.encoders package.
Creating ExpressionEncoder For Row Type — apply method
|
1 2 3 4 5 |
apply(schema: StructType): ExpressionEncoder[Row] |
apply builds ExpressionEncoder of Row, i.e. ExpressionEncoder[Row], from the input StructType (as schema).
Internally, apply creates a BoundReference for the Row type and returns a ExpressionEncoder[Row] for the input schema, a CreateNamedStruct serializer (using serializerFor internal method), a deserializer for the schema, and the Row type.
serializerFor Internal Method
|
1 2 3 4 5 |
serializerFor(inputObject: Expression, inputType: DataType): Expression |
serializerFor creates an Expression that is assumed to be CreateNamedStruct.
serializerFor takes the input inputType and:
-
Returns the input
inputObjectas is for native types, i.e.NullType,BooleanType,ByteType,ShortType,IntegerType,LongType,FloatType,DoubleType,BinaryType,CalendarIntervalType.CautionFIXME What does being native type mean? -
For
UserDefinedTypes, it takes the UDT class from theSQLUserDefinedTypeannotation orUDTRegistrationobject and returns an expression withInvoketo callserializemethod on aNewInstanceof the UDT class. -
For TimestampType, it returns an expression with a StaticInvoke to call
fromJavaTimestamponDateTimeUtilsclass. -
…FIXME
|
Caution
|
FIXME Describe me. |
spark技术分享