关注 spark技术分享,
撸spark源码 玩spark最佳实践

Data Types

Data Types

DataType abstract class is the base type of all built-in data types in Spark SQL, e.g. strings, longs.

DataType has two main type families:

  • Atomic Types as an internal type to represent types that are not null, UDTs, arrays, structs, and maps

  • Numeric Types with fractional and integral types

Table 1. Standard Data Types
Type Family Data Type Scala Types

Atomic Types

(except fractional and integral types)

BinaryType

BooleanType

DateType

StringType

TimestampType

java.sql.Timestamp

Fractional Types

(concrete NumericType)

DecimalType

DoubleType

FloatType

Integral Types

(concrete NumericType)

ByteType

IntegerType

LongType

ShortType

ArrayType

CalendarIntervalType

MapType

NullType

ObjectType

StructType

UserDefinedType

AnyDataType

Matches any concrete data type

Caution
FIXME What about AbstractDataType?

You can extend the type system and create your own user-defined types (UDTs).

The DataType Contract defines methods to build SQL, JSON and string representations.

Note
DataType (and the concrete Spark SQL types) live in org.apache.spark.sql.types package.

You should use DataTypes object in your code to create complex Spark SQL types, i.e. arrays or maps.

DataType has support for Scala’s pattern matching using unapply method.

DataType Contract

Any type in Spark SQL follows the DataType contract which means that the types define the following methods:

  • json and prettyJson to build JSON representations of a data type

  • defaultSize to know the default size of values of a type

  • simpleString and catalogString to build user-friendly string representations (with the latter for external catalogs)

  • sql to build SQL representation

DataTypes — Factory Methods for Data Types

DataTypes is a Java class with methods to access simple or create complex DataType types in Spark SQL, i.e. arrays and maps.

Tip
It is recommended to use DataTypes class to define DataType types in a schema.

DataTypes lives in org.apache.spark.sql.types package.

Note

Simple DataType types themselves, i.e. StringType or CalendarIntervalType, come with their own Scala’s case objects alongside their definitions.

You may also import the types package and have access to the types.

UDTs — User-Defined Types

Caution
FIXME
赞(0) 打赏
未经允许不得转载:spark技术分享 » Data Types
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏