Estimators — ML Pipeline Component
An estimator is an abstraction of a learning algorithm that fits a model on a dataset.
|
Note
|
That was so machine learning to explain an estimator this way, wasn’t it? It is that the more I spend time with Pipeline API the often I use the terms and phrases from this space. Sorry. |
Technically, an Estimator produces a Model (i.e. a Transformer) for a given DataFrame and parameters (as ParamMap). It fits a model to the input DataFrame and ParamMap to produce a Transformer (a Model) that can calculate predictions for any DataFrame-based input datasets.
It is basically a function that maps a DataFrame onto a Model through fit method, i.e. it takes a DataFrame and produces a Transformer as a Model.
|
1 2 3 4 5 |
estimator: DataFrame =[fit]=> Model |
Estimators are instances of org.apache.spark.ml.Estimator abstract class that comes with fit method (with the return type M being a Model):
|
1 2 3 4 5 |
fit(dataset: DataFrame): M |
Estimator is a PipelineStage and so it can be a part of a Pipeline.
|
Note
|
Pipeline considers Estimator special and executes fit method before transform (as for other Transformer objects in a pipeline). Consult Pipeline document.
|
spark技术分享