StringIndexer
org.apache.spark.ml.feature.StringIndexer
is an Estimator that produces a StringIndexerModel
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
val df = ('a' to 'a' + 9).map(_.toString) .zip(0 to 9) .map(_.swap) .toDF("id", "label") import org.apache.spark.ml.feature.StringIndexer val strIdx = new StringIndexer() .setInputCol("label") .setOutputCol("index") scala> println(strIdx.explainParams) handleInvalid: how to handle invalid entries. Options are skip (which will filter out rows with bad values), or error (which will throw an error). More options may be added later (default: error) inputCol: input column name (current: label) outputCol: output column name (default: strIdx_ded89298e014__output, current: index) val model = strIdx.fit(df) val indexed = model.transform(df) scala> indexed.show +---+-----+-----+ | id|label|index| +---+-----+-----+ | 0| a| 3.0| | 1| b| 5.0| | 2| c| 7.0| | 3| d| 9.0| | 4| e| 0.0| | 5| f| 2.0| | 6| g| 6.0| | 7| h| 8.0| | 8| i| 4.0| | 9| j| 1.0| +---+-----+-----+ |