StructType — Data Type for Schema Definition
StructType
is a built-in data type that is a collection of StructFields.
StructType
is used to define a schema or its part.
You can compare two StructType
instances to see whether they are equal.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import org.apache.spark.sql.types.StructType val schemaUntyped = new StructType() .add("a", "int") .add("b", "string") import org.apache.spark.sql.types.{IntegerType, StringType} val schemaTyped = new StructType() .add("a", IntegerType) .add("b", StringType) scala> schemaUntyped == schemaTyped res0: Boolean = true |
StructType
presents itself as <struct>
or STRUCT
in query plans or SQL.
Note
|
Read the official documentation of Scala’s scala.collection.Seq. |
As of Spark 2.4.0, StructType
can be converted to DDL format using toDDL method.
1 2 3 4 5 6 7 8 9 10 11 12 |
Example: Using StructType.toDDL // Generating a schema from a case class // Because we're all properly lazy case class Person(id: Long, name: String) import org.apache.spark.sql.Encoders val schema = Encoders.product[Person].schema scala> println(schema.toDDL) `id` BIGINT,`name` STRING |
fromAttributes
Method
1 2 3 4 5 |
fromAttributes(attributes: Seq[Attribute]): StructType |
fromAttributes
…FIXME
Note
|
fromAttributes is used when…FIXME
|
toAttributes
Method
1 2 3 4 5 |
toAttributes: Seq[AttributeReference] |
toAttributes
…FIXME
Note
|
toAttributes is used when…FIXME
|
Adding Fields to Schema — add
Method
You can add a new StructField
to your StructType
. There are different variants of add
method that all make for a new StructType
with the field added.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
add(field: StructField): StructType add(name: String, dataType: DataType): StructType add(name: String, dataType: DataType, nullable: Boolean): StructType add( name: String, dataType: DataType, nullable: Boolean, metadata: Metadata): StructType add( name: String, dataType: DataType, nullable: Boolean, comment: String): StructType add(name: String, dataType: String): StructType add(name: String, dataType: String, nullable: Boolean): StructType add( name: String, dataType: String, nullable: Boolean, metadata: Metadata): StructType add( name: String, dataType: String, nullable: Boolean, comment: String): StructType |
DataType Name Conversions
1 2 3 4 5 6 7 |
simpleString: String catalogString: String sql: String |
StructType
as a custom DataType
is used in query plans or SQL. It can present itself using simpleString
, catalogString
or sql
(see DataType Contract).
1 2 3 4 5 6 7 8 9 10 11 12 |
scala> schemaTyped.simpleString res0: String = struct<a:int,b:string> scala> schemaTyped.catalogString res1: String = struct<a:int,b:string> scala> schemaTyped.sql res2: String = STRUCT<`a`: INT, `b`: STRING> |
Accessing StructField — apply
Method
1 2 3 4 5 |
apply(name: String): StructField |
StructType
defines its own apply
method that gives you an easy access to a StructField
by name.
1 2 3 4 5 6 7 8 9 10 11 |
scala> schemaTyped.printTreeString root |-- a: integer (nullable = true) |-- b: string (nullable = true) scala> schemaTyped("a") res4: org.apache.spark.sql.types.StructField = StructField(a,IntegerType,true) |
Creating StructType from Existing StructType — apply
Method
1 2 3 4 5 |
apply(names: Set[String]): StructType |
This variant of apply
lets you create a StructType
out of an existing StructType
with the names
only.
1 2 3 4 5 6 |
scala> schemaTyped(names = Set("a")) res0: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true)) |
It will throw an IllegalArgumentException
exception when a field could not be found.
1 2 3 4 5 6 7 8 |
scala> schemaTyped(names = Set("a", "c")) java.lang.IllegalArgumentException: Field c does not exist. at org.apache.spark.sql.types.StructType.apply(StructType.scala:275) ... 48 elided |
Displaying Schema As Tree — printTreeString
Method
1 2 3 4 5 |
printTreeString(): Unit |
printTreeString
prints out the schema to standard output.
1 2 3 4 5 6 7 8 |
scala> schemaTyped.printTreeString root |-- a: integer (nullable = true) |-- b: string (nullable = true) |
Internally, it uses treeString
method to build the tree and then println
it.
Creating StructType For DDL-Formatted Text — fromDDL
Object Method
1 2 3 4 5 |
fromDDL(ddl: String): StructType |
fromDDL
…FIXME
Note
|
fromDDL is used when…FIXME
|
Converting to DDL Format — toDDL
Method
1 2 3 4 5 |
toDDL: String |
toDDL
converts all the fields to DDL format and concatenates them using the comma (,
).