Row
Row
is a generic row object with an ordered collection of fields that can be accessed by an ordinal / an index (aka generic access by ordinal), a name (aka native primitive access) or using Scala’s pattern matching.
Note
|
Row is also called Catalyst Row.
|
Row
may have an optional schema.
The traits of Row
:
-
length
orsize
–Row
knows the number of elements (columns). -
schema
–Row
knows the schema
Row
belongs to org.apache.spark.sql.Row
package.
1 2 3 4 5 |
import org.apache.spark.sql.Row |
Field Access by Index — apply
and get
methods
Fields of a Row
instance can be accessed by index (starting from 0
) using apply
or get
.
1 2 3 4 5 6 7 8 9 10 11 12 |
scala> val row = Row(1, "hello") row: org.apache.spark.sql.Row = [1,hello] scala> row(1) res0: Any = hello scala> row.get(1) res1: Any = hello |
Note
|
Generic access by ordinal (using apply or get ) returns a value of type Any .
|
Get Field As Type — getAs
method
You can query for fields with their proper types using getAs
with an index
1 2 3 4 5 6 7 8 9 10 11 |
val row = Row(1, "hello") scala> row.getAs[Int](0) res1: Int = 1 scala> row.getAs[String](1) res2: String = hello |
Note
|
FIXME
|
Schema
A Row
instance can have a schema defined.
Note
|
Unless you are instantiating Row yourself (using Row Object), a Row has always a schema.
|
Note
|
It is RowEncoder to take care of assigning a schema to a Row when toDF on a Dataset or when instantiating DataFrame through DataFrameReader.
|
Row Object
Row
companion object offers factory methods to create Row
instances from a collection of elements (apply
), a sequence of elements (fromSeq
) and tuples (fromTuple
).
1 2 3 4 5 6 7 8 9 10 11 12 |
scala> Row(1, "hello") res0: org.apache.spark.sql.Row = [1,hello] scala> Row.fromSeq(Seq(1, "hello")) res1: org.apache.spark.sql.Row = [1,hello] scala> Row.fromTuple((0, "hello")) res2: org.apache.spark.sql.Row = [0,hello] |
Row
object can merge Row
instances.
1 2 3 4 5 6 |
scala> Row.merge(Row(1), Row("hello")) res3: org.apache.spark.sql.Row = [1,hello] |
It can also return an empty Row
instance.
1 2 3 4 5 6 |
scala> Row.empty == Row() res4: Boolean = true |
Pattern Matching on Row
Row
can be used in pattern matching (since Row Object comes with unapplySeq
).
1 2 3 4 5 6 7 8 9 10 |
scala> Row.unapplySeq(Row(1, "hello")) res5: Some[Seq[Any]] = Some(WrappedArray(1, hello)) Row(1, "hello") match { case Row(key: Int, value: String) => key -> value } |