Estimators-spark技术分享

Estimators — ML Pipeline Component

An estimator is an abstraction of a learning algorithm that fits a model on a dataset.

Note	That was so machine learning to explain an estimator this way, wasn’t it? It is that the more I spend time with Pipeline API the often I use the terms and phrases from this space. Sorry.

Technically, an Estimator produces a Model (i.e. a Transformer) for a given DataFrame and parameters (as ParamMap). It fits a model to the input DataFrame and ParamMap to produce a Transformer (a Model) that can calculate predictions for any DataFrame-based input datasets.

It is basically a function that maps a DataFrame onto a Model through fit method, i.e. it takes a DataFrame and produces a Transformer as a Model.



estimator: DataFrame =[fit]=> Model

estimator: DataFrame =[fit]=> Model

Estimators are instances of org.apache.spark.ml.Estimator abstract class that comes with fit method (with the return type M being a Model):



fit(dataset: DataFrame): M

fit(dataset: DataFrame): M

Estimator is a PipelineStage and so it can be a part of a Pipeline.

Note	`Pipeline` considers `Estimator` special and executes `fit` method before `transform` (as for other `Transformer` objects in a pipeline). Consult Pipeline document.

Estimators

Estimators — ML Pipeline Component

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部