关注 spark技术分享,
撸spark源码 玩spark最佳实践

UnsafeRow — Mutable Raw-Memory Unsafe Binary Row Format

UnsafeRow — Mutable Raw-Memory Unsafe Binary Row Format

UnsafeRow is a concrete InternalRow that represents a mutable internal raw-memory (and hence unsafe) binary row format.

In other words, UnsafeRow is an InternalRow that is backed by raw memory instead of Java objects.

UnsafeRow knows its size in bytes.

UnsafeRow supports Java’s Externalizable and Kryo’s KryoSerializable serialization/deserialization protocols.

The fields of a data row are placed using field offsets.

UnsafeRow considers a data type mutable if it is one of the following:

UnsafeRow is composed of three regions:

  1. Null Bit Set Bitmap Region (1 bit/field) for tracking null values

  2. Fixed-Length 8-Byte Values Region

  3. Variable-Length Data Section

That gives the property of rows being always 8-byte word aligned and so their size is always a multiple of 8 bytes.

Equality comparision and hashing of rows can be performed on raw bytes since if two rows are identical so should be their bit-wise representation. No type-specific interpretation is required.

isMutable Static Predicate

isMutable is enabled (true) when the input DataType is among the mutable field types or a DecimalType.

Otherwise, isMutable is disabled (false).

Note

isMutable is used when:

Kryo’s KryoSerializable SerDe Protocol

Tip
Read up on KryoSerializable.

Serializing JVM Object — KryoSerializable’s write Method

Deserializing Kryo-Managed Object — KryoSerializable’s read Method

Java’s Externalizable SerDe Protocol

Tip
Read up on java.io.Externalizable.

Serializing JVM Object — Externalizable’s writeExternal Method

Deserializing Java-Externalized Object — Externalizable’s readExternal Method

pointTo Method

pointTo…​FIXME

Note
pointTo is used when…​FIXME
赞(0) 打赏
未经允许不得转载:spark技术分享 » UnsafeRow — Mutable Raw-Memory Unsafe Binary Row Format
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏