StructType — Data Type for Schema Definition
StructType is a built-in data type that is a collection of StructFields.
StructType is used to define a schema or its part.
You can compare two StructType instances to see whether they are equal.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import org.apache.spark.sql.types.StructType val schemaUntyped = new StructType() .add("a", "int") .add("b", "string") import org.apache.spark.sql.types.{IntegerType, StringType} val schemaTyped = new StructType() .add("a", IntegerType) .add("b", StringType) scala> schemaUntyped == schemaTyped res0: Boolean = true |
StructType presents itself as <struct> or STRUCT in query plans or SQL.
|
Note
|
Read the official documentation of Scala’s scala.collection.Seq. |
As of Spark 2.4.0, StructType can be converted to DDL format using toDDL method.
|
1 2 3 4 5 6 7 8 9 10 11 12 |
Example: Using StructType.toDDL // Generating a schema from a case class // Because we're all properly lazy case class Person(id: Long, name: String) import org.apache.spark.sql.Encoders val schema = Encoders.product[Person].schema scala> println(schema.toDDL) `id` BIGINT,`name` STRING |
fromAttributes Method
|
1 2 3 4 5 |
fromAttributes(attributes: Seq[Attribute]): StructType |
fromAttributes…FIXME
|
Note
|
fromAttributes is used when…FIXME
|
toAttributes Method
|
1 2 3 4 5 |
toAttributes: Seq[AttributeReference] |
toAttributes…FIXME
|
Note
|
toAttributes is used when…FIXME
|
Adding Fields to Schema — add Method
You can add a new StructField to your StructType. There are different variants of add method that all make for a new StructType with the field added.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
add(field: StructField): StructType add(name: String, dataType: DataType): StructType add(name: String, dataType: DataType, nullable: Boolean): StructType add( name: String, dataType: DataType, nullable: Boolean, metadata: Metadata): StructType add( name: String, dataType: DataType, nullable: Boolean, comment: String): StructType add(name: String, dataType: String): StructType add(name: String, dataType: String, nullable: Boolean): StructType add( name: String, dataType: String, nullable: Boolean, metadata: Metadata): StructType add( name: String, dataType: String, nullable: Boolean, comment: String): StructType |
DataType Name Conversions
|
1 2 3 4 5 6 7 |
simpleString: String catalogString: String sql: String |
StructType as a custom DataType is used in query plans or SQL. It can present itself using simpleString, catalogString or sql (see DataType Contract).
|
1 2 3 4 5 6 7 8 9 10 11 12 |
scala> schemaTyped.simpleString res0: String = struct<a:int,b:string> scala> schemaTyped.catalogString res1: String = struct<a:int,b:string> scala> schemaTyped.sql res2: String = STRUCT<`a`: INT, `b`: STRING> |
Accessing StructField — apply Method
|
1 2 3 4 5 |
apply(name: String): StructField |
StructType defines its own apply method that gives you an easy access to a StructField by name.
|
1 2 3 4 5 6 7 8 9 10 11 |
scala> schemaTyped.printTreeString root |-- a: integer (nullable = true) |-- b: string (nullable = true) scala> schemaTyped("a") res4: org.apache.spark.sql.types.StructField = StructField(a,IntegerType,true) |
Creating StructType from Existing StructType — apply Method
|
1 2 3 4 5 |
apply(names: Set[String]): StructType |
This variant of apply lets you create a StructType out of an existing StructType with the names only.
|
1 2 3 4 5 6 |
scala> schemaTyped(names = Set("a")) res0: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true)) |
It will throw an IllegalArgumentException exception when a field could not be found.
|
1 2 3 4 5 6 7 8 |
scala> schemaTyped(names = Set("a", "c")) java.lang.IllegalArgumentException: Field c does not exist. at org.apache.spark.sql.types.StructType.apply(StructType.scala:275) ... 48 elided |
Displaying Schema As Tree — printTreeString Method
|
1 2 3 4 5 |
printTreeString(): Unit |
printTreeString prints out the schema to standard output.
|
1 2 3 4 5 6 7 8 |
scala> schemaTyped.printTreeString root |-- a: integer (nullable = true) |-- b: string (nullable = true) |
Internally, it uses treeString method to build the tree and then println it.
Creating StructType For DDL-Formatted Text — fromDDL Object Method
|
1 2 3 4 5 |
fromDDL(ddl: String): StructType |
fromDDL…FIXME
|
Note
|
fromDDL is used when…FIXME
|
Converting to DDL Format — toDDL Method
|
1 2 3 4 5 |
toDDL: String |
toDDL converts all the fields to DDL format and concatenates them using the comma (,).
spark技术分享