Example — Linear Regression
The DataFrame used for Linear Regression has to have features
column of org.apache.spark.mllib.linalg.VectorUDT
type.
Note
|
You can change the name of the column using featuresCol parameter.
|
The list of the parameters of LinearRegression
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
scala> println(lr.explainParams) elasticNetParam: the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty (default: 0.0) featuresCol: features column name (default: features) fitIntercept: whether to fit an intercept term (default: true) labelCol: label column name (default: label) maxIter: maximum number of iterations (>= 0) (default: 100) predictionCol: prediction column name (default: prediction) regParam: regularization parameter (>= 0) (default: 0.0) solver: the solver algorithm for optimization. If this is not set or empty, default value is 'auto' (default: auto) standardization: whether to standardize the training features before fitting the model (default: true) tol: the convergence tolerance for iterative algorithms (default: 1.0E-6) weightCol: weight column name. If this is not set or empty, we treat all instance weights as 1.0 (default: ) |
Caution
|
FIXME The following example is work in progress. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
import org.apache.spark.ml.Pipeline val pipeline = new Pipeline("my_pipeline") import org.apache.spark.ml.regression._ val lr = new LinearRegression val df = sc.parallelize(0 to 9).toDF("num") val stages = Array(lr) val model = pipeline.setStages(stages).fit(df) // the above lines gives: java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.mllib.linalg.VectorUDT@f71b0bce but was actually IntegerType. at scala.Predef$.require(Predef.scala:219) at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42) at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:51) at org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:72) at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:117) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:182) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:182) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:182) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:66) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:133) ... 51 elided |