关注 spark技术分享,
撸spark源码 玩spark最佳实践

ML Persistence — Saving and Loading Models and Pipelines

ML Persistence — Saving and Loading Models and Pipelines

MLWriter and MLReader belong to org.apache.spark.ml.util package.

They allow you to save and load models despite the languages — Scala, Java, Python or R — they have been saved in and loaded later on.

MLWriter

MLWriter abstract class comes with save(path: String) method to save a ML component to a given path.

It comes with another (chainable) method overwrite to overwrite the output path if it already exists.

The component is saved into a JSON file (see MLWriter Example section below).

Tip

Enable INFO logging level for the MLWriter implementation logger to see what happens inside.

Add the following line to conf/log4j.properties:

Refer to Logging.

Caution
FIXME The logging doesn’t work and overwriting does not print out INFO message to the logs 🙁

MLWriter Example

The result of save for “unfitted” pipeline is a JSON file for metadata (as shown below).

The result of save for pipeline model is a JSON file for metadata while Parquet for model data, e.g. coefficients.

MLReader

MLReader abstract class comes with load(path: String) method to load a ML component from a given path.

赞(0) 打赏
未经允许不得转载:spark技术分享 » ML Persistence — Saving and Loading Models and Pipelines
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏