Params (and ParamMaps)
Params
is the contract in Spark MLlib for ML components that take parameters.
Params
has params
collection of Param
objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
import org.apache.spark.ml.recommendation.ALS val als = new ALS(). setMaxIter(5). setRegParam(0.01). setUserCol("userId"). setItemCol("movieId"). setRatingCol("rating") scala> :type als.params Array[org.apache.spark.ml.param.Param[_]] scala> println(als.explainParams) alpha: alpha for implicit preference (default: 1.0) checkpointInterval: set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations (default: 10) coldStartStrategy: strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop. (default: nan) finalStorageLevel: StorageLevel for ALS model factors. (default: MEMORY_AND_DISK) implicitPrefs: whether to use implicit preference (default: false) intermediateStorageLevel: StorageLevel for intermediate datasets. Cannot be 'NONE'. (default: MEMORY_AND_DISK) itemCol: column name for item ids. Ids must be within the integer value range. (default: item, current: movieId) maxIter: maximum number of iterations (>= 0) (default: 10, current: 5) nonnegative: whether to use nonnegative constraint for least squares (default: false) numItemBlocks: number of item blocks (default: 10) numUserBlocks: number of user blocks (default: 10) predictionCol: prediction column name (default: prediction) rank: rank of the factorization (default: 10) ratingCol: column name for ratings (default: rating, current: rating) regParam: regularization parameter (>= 0) (default: 0.1, current: 0.01) seed: random seed (default: 1994790107) userCol: column name for user ids. Ids must be within the integer value range. (default: user, current: userId) |
1 2 3 4 5 6 7 8 9 10 11 12 |
import org.apache.spark.ml.tuning.CrossValidator val cv = new CrossValidator scala> println(cv.explainParams) estimator: estimator for selection (undefined) estimatorParamMaps: param maps for the estimator (undefined) evaluator: evaluator used to select hyper-parameters that maximize the validated metric (undefined) numFolds: number of folds for cross validation (>= 2) (default: 3) seed: random seed (default: -1191137437) |
Params
comes with $
(dollar) method for Spark MLlib developers to access the user-defined or the default value of a parameter.
Params Contract
1 2 3 4 5 6 7 8 9 |
package org.apache.spark.ml.param trait Params { def copy(extra: ParamMap): Params } |
Method | Description |
---|---|
Explaining Parameters — explainParams
Method
1 2 3 4 5 |
explainParams(): String |
explainParams
takes params collection of parameters and converts every parameter to a corresponding help text with the param name, the description and optionally the default and the user-defined values if available.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
import org.apache.spark.ml.recommendation.ALS val als = new ALS(). setMaxIter(5). setRegParam(0.01). setUserCol("userId"). setItemCol("movieId"). setRatingCol("rating") scala> println(als.explainParams) alpha: alpha for implicit preference (default: 1.0) checkpointInterval: set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations (default: 10) coldStartStrategy: strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop. (default: nan) finalStorageLevel: StorageLevel for ALS model factors. (default: MEMORY_AND_DISK) implicitPrefs: whether to use implicit preference (default: false) intermediateStorageLevel: StorageLevel for intermediate datasets. Cannot be 'NONE'. (default: MEMORY_AND_DISK) itemCol: column name for item ids. Ids must be within the integer value range. (default: item, current: movieId) maxIter: maximum number of iterations (>= 0) (default: 10, current: 5) nonnegative: whether to use nonnegative constraint for least squares (default: false) numItemBlocks: number of item blocks (default: 10) numUserBlocks: number of user blocks (default: 10) predictionCol: prediction column name (default: prediction) rank: rank of the factorization (default: 10) ratingCol: column name for ratings (default: rating, current: rating) regParam: regularization parameter (>= 0) (default: 0.1, current: 0.01) seed: random seed (default: 1994790107) userCol: column name for user ids. Ids must be within the integer value range. (default: user, current: userId) |
Copying Parameters with Optional Extra Values — copyValues
Method
1 2 3 4 5 |
copyValues[T](to: T, extra: ParamMap = ParamMap.empty): T |
copyValues
adds extra
parameters to paramMap, possibly overridding existing keys.
copyValues
iterates over params collection and sets the default value followed by what may have been defined using the user-defined and extra
parameters.
Note
|
copyValues is used mainly for copy method.
|