Data Types
DataType
abstract class is the base type of all built-in data types in Spark SQL, e.g. strings, longs.
DataType
has two main type families:
-
Atomic Types as an internal type to represent types that are not
null
, UDTs, arrays, structs, and maps -
Numeric Types with fractional and integral types
Type Family | Data Type | Scala Types |
---|---|---|
(except fractional and integral types) |
||
|
||
(concrete NumericType) |
||
(concrete NumericType) |
||
Matches any concrete data type |
Caution
|
FIXME What about AbstractDataType? |
You can extend the type system and create your own user-defined types (UDTs).
The DataType Contract defines methods to build SQL, JSON and string representations.
Note
|
DataType (and the concrete Spark SQL types) live in org.apache.spark.sql.types package.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import org.apache.spark.sql.types.StringType scala> StringType.json res0: String = "string" scala> StringType.sql res1: String = STRING scala> StringType.catalogString res2: String = string |
You should use DataTypes object in your code to create complex Spark SQL types, i.e. arrays or maps.
1 2 3 4 5 6 7 8 9 10 11 |
import org.apache.spark.sql.types.DataTypes scala> val arrayType = DataTypes.createArrayType(BooleanType) arrayType: org.apache.spark.sql.types.ArrayType = ArrayType(BooleanType,true) scala> val mapType = DataTypes.createMapType(StringType, LongType) mapType: org.apache.spark.sql.types.MapType = MapType(StringType,LongType,true) |
DataType
has support for Scala’s pattern matching using unapply
method.
1 2 3 4 5 |
??? |
DataType Contract
Any type in Spark SQL follows the DataType
contract which means that the types define the following methods:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
import org.apache.spark.sql.types.DataTypes._ val maps = StructType( StructField("longs2strings", createMapType(LongType, StringType), false) :: Nil) scala> maps.prettyJson res0: String = { "type" : "struct", "fields" : [ { "name" : "longs2strings", "type" : { "type" : "map", "keyType" : "long", "valueType" : "string", "valueContainsNull" : true }, "nullable" : false, "metadata" : { } } ] } scala> maps.defaultSize res1: Int = 2800 scala> maps.simpleString res2: String = struct<longs2strings:map<bigint,string>> scala> maps.catalogString res3: String = struct<longs2strings:map<bigint,string>> scala> maps.sql res4: String = STRUCT<`longs2strings`: MAP<BIGINT, STRING>> |
DataTypes — Factory Methods for Data Types
DataTypes
is a Java class with methods to access simple or create complex DataType
types in Spark SQL, i.e. arrays and maps.
Tip
|
It is recommended to use DataTypes class to define DataType types in a schema.
|
DataTypes
lives in org.apache.spark.sql.types
package.
1 2 3 4 5 6 7 8 9 10 11 |
import org.apache.spark.sql.types.DataTypes scala> val arrayType = DataTypes.createArrayType(BooleanType) arrayType: org.apache.spark.sql.types.ArrayType = ArrayType(BooleanType,true) scala> val mapType = DataTypes.createMapType(StringType, LongType) mapType: org.apache.spark.sql.types.MapType = MapType(StringType,LongType,true) |
Note
|
Simple You may also import the
|