Estimators — ML Pipeline Component
An estimator is an abstraction of a learning algorithm that fits a model on a dataset.
Note
|
That was so machine learning to explain an estimator this way, wasn’t it? It is that the more I spend time with Pipeline API the often I use the terms and phrases from this space. Sorry. |
Technically, an Estimator
produces a Model (i.e. a Transformer) for a given DataFrame
and parameters (as ParamMap
). It fits a model to the input DataFrame
and ParamMap
to produce a Transformer
(a Model
) that can calculate predictions for any DataFrame
-based input datasets.
It is basically a function that maps a DataFrame
onto a Model
through fit
method, i.e. it takes a DataFrame
and produces a Transformer
as a Model
.
1 2 3 4 5 |
estimator: DataFrame =[fit]=> Model |
Estimators are instances of org.apache.spark.ml.Estimator abstract class that comes with fit
method (with the return type M
being a Model
):
1 2 3 4 5 |
fit(dataset: DataFrame): M |
Estimator
is a PipelineStage and so it can be a part of a Pipeline.
Note
|
Pipeline considers Estimator special and executes fit method before transform (as for other Transformer objects in a pipeline). Consult Pipeline document.
|