Vector
Vector
sealed trait represents a numeric vector of values (of Double
type) and their indices (of Int
type).
It belongs to org.apache.spark.mllib.linalg
package.
Note
|
To Scala and Java developers:
It is not the Vector type in Scala or Java. Train your eyes to see two types of the same name. You’ve been warned. |
A Vector
object knows its size
.
A Vector
object can be converted to:
-
Array[Double]
usingtoArray
. -
a dense vector as
DenseVector
usingtoDense
. -
a sparse vector as
SparseVector
usingtoSparse
. -
(1.6.0) a JSON string using
toJson
. -
(internal) a breeze vector as
BV[Double]
usingtoBreeze
.
There are exactly two available implementations of Vector
sealed trait (that also belong to org.apache.spark.mllib.linalg
package):
-
DenseVector
-
SparseVector
Tip
|
Use Vectors factory object to create vectors, be it DenseVector or SparseVector .
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import org.apache.spark.mllib.linalg.Vectors // You can create dense vectors explicitly by giving values per index val denseVec = Vectors.dense(Array(0.0, 0.4, 0.3, 1.5)) val almostAllZeros = Vectors.dense(Array(0.0, 0.4, 0.3, 1.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) // You can however create a sparse vector by the size and non-zero elements val sparse = Vectors.sparse(10, Seq((1, 0.4), (2, 0.3), (3, 1.5))) // Convert a dense vector to a sparse one val fromSparse = sparse.toDense scala> almostAllZeros == fromSparse res0: Boolean = true |
Note
|
The factory object is called Vectors (plural).
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import org.apache.spark.mllib.linalg._ // prepare elements for a sparse vector // NOTE: It is more Scala rather than Spark val indices = 0 to 4 val elements = indices.zip(Stream.continually(1.0)) val sv = Vectors.sparse(elements.size, elements) // Notice how Vector is printed out scala> sv res4: org.apache.spark.mllib.linalg.Vector = (5,[0,1,2,3,4],[1.0,1.0,1.0,1.0,1.0]) scala> sv.size res0: Int = 5 scala> sv.toArray res1: Array[Double] = Array(1.0, 1.0, 1.0, 1.0, 1.0) scala> sv == sv.copy res2: Boolean = true scala> sv.toJson res3: String = {"type":0,"size":5,"indices":[0,1,2,3,4],"values":[1.0,1.0,1.0,1.0,1.0]} |