关注 spark技术分享,
撸spark源码 玩spark最佳实践

Estimators

Estimators — ML Pipeline Component

An estimator is an abstraction of a learning algorithm that fits a model on a dataset.

Note
That was so machine learning to explain an estimator this way, wasn’t it? It is that the more I spend time with Pipeline API the often I use the terms and phrases from this space. Sorry.

Technically, an Estimator produces a Model (i.e. a Transformer) for a given DataFrame and parameters (as ParamMap). It fits a model to the input DataFrame and ParamMap to produce a Transformer (a Model) that can calculate predictions for any DataFrame-based input datasets.

It is basically a function that maps a DataFrame onto a Model through fit method, i.e. it takes a DataFrame and produces a Transformer as a Model.

Estimators are instances of org.apache.spark.ml.Estimator abstract class that comes with fit method (with the return type M being a Model):

Estimator is a PipelineStage and so it can be a part of a Pipeline.

Note
Pipeline considers Estimator special and executes fit method before transform (as for other Transformer objects in a pipeline). Consult Pipeline document.
赞(0) 打赏
未经允许不得转载:spark技术分享 » Estimators
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏