UnsafeRow — Mutable Raw-Memory Unsafe Binary Row Format
UnsafeRow
is a concrete InternalRow that represents a mutable internal raw-memory (and hence unsafe) binary row format.
In other words, UnsafeRow
is an InternalRow
that is backed by raw memory instead of Java objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
// Use ExpressionEncoder for simplicity import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder val stringEncoder = ExpressionEncoder[String] val row = stringEncoder.toRow("hello world") import org.apache.spark.sql.catalyst.expressions.UnsafeRow val unsafeRow = row match { case ur: UnsafeRow => ur } scala> unsafeRow.getBytes res0: Array[Byte] = Array(0, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 16, 0, 0, 0, 104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 0, 0, 0, 0, 0) scala> unsafeRow.getUTF8String(0) res1: org.apache.spark.unsafe.types.UTF8String = hello world |
UnsafeRow
knows its size in bytes.
1 2 3 4 5 6 |
scala> println(unsafeRow.getSizeInBytes) 32 |
UnsafeRow
supports Java’s Externalizable and Kryo’s KryoSerializable serialization/deserialization protocols.
The fields of a data row are placed using field offsets.
UnsafeRow
considers a data type mutable if it is one of the following:
UnsafeRow
is composed of three regions:
-
Null Bit Set Bitmap Region (1 bit/field) for tracking null values
-
Fixed-Length 8-Byte Values Region
-
Variable-Length Data Section
That gives the property of rows being always 8-byte word aligned and so their size is always a multiple of 8 bytes.
Equality comparision and hashing of rows can be performed on raw bytes since if two rows are identical so should be their bit-wise representation. No type-specific interpretation is required.
isMutable
Static Predicate
1 2 3 4 5 |
static boolean isMutable(DataType dt) |
isMutable
is enabled (true
) when the input DataType is among the mutable field types or a DecimalType.
Otherwise, isMutable
is disabled (false
).
Note
|
|
Kryo’s KryoSerializable SerDe Protocol
Tip
|
Read up on KryoSerializable. |
Java’s Externalizable SerDe Protocol
Tip
|
Read up on java.io.Externalizable. |