Predictor
Predictor is an Estimator for a PredictionModel with its own abstract train method.
|
1 2 3 4 5 |
train(dataset: DataFrame): M |
The train method is supposed to ease dealing with schema validation and copying parameters to a trained PredictionModel model. It also sets the parent of the model to itself.
A Predictor is basically a function that maps a DataFrame onto a PredictionModel.
|
1 2 3 4 5 |
predictor: DataFrame =[train]=> PredictionModel |
It implements the abstract fit(dataset: DataFrame) of the Estimator abstract class that validates and transforms the schema of a dataset (using a custom transformSchema of PipelineStage), and then calls the abstract train method.
Validation and transformation of a schema (using transformSchema) makes sure that:
-
featurescolumn exists and is of correct type (defaults to Vector). -
labelcolumn exists and is ofDoubletype.
As the last step, it adds the prediction column of Double type.
spark技术分享