Alternating Least Squares (ALS) Matrix Factorization-spark技术分享

Alternating Least Squares (ALS) Matrix Factorization for Recommender Systems

Alternating Least Squares (ALS) Matrix Factorization is a recommendation algorithm…FIXME

Tip	Read the original paper Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights by Robert M. Bell and Yehuda Koren.

Recommender systems based on collaborative filtering predict user preferences for products or services by learning past user-item relationships. A predominant approach to collaborative filtering is neighborhood based (“k-nearest neighbors”), where a user-item preference rating is interpolated from ratings of similar items and/or users.

Our method is very fast in practice, generating a prediction in about 0.2 milliseconds. Importantly, it does not require training many parameters or a lengthy preprocessing, making it very practical for large scale applications. Finally, we show how to apply these methods to the perceivably much slower user-oriented approach. To this end, we suggest a novel scheme for low dimensional embedding of the users. We evaluate these methods on the Netflix dataset, where they deliver significantly better results than the commercial Netflix Cinematch recommender system.

— Robert M. Bell and Yehuda Koren
Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

Tip	Read the follow-up paper Collaborative Filtering for Implicit Feedback Datasets by Yifan Hu, Yehuda Koren and Chris Volinsky.


ALS Example

// Based on JavaALSExample from the official Spark examples
// https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java

// 1. Save the code to als.scala
// 2. Run `spark-shell -i als.scala`

import spark.implicits._

import org.apache.spark.ml.recommendation.ALS
val als = new ALS().
  setMaxIter(5).
  setRegParam(0.01).
  setUserCol("userId").
  setItemCol("movieId").
  setRatingCol("rating")

import org.apache.spark.ml.recommendation.ALS.Rating
// FIXME Use a much richer dataset, i.e. Spark's data/mllib/als/sample_movielens_ratings.txt
// FIXME Load it using spark.read
val ratings = Seq(
  Rating(0, 2, 3),
  Rating(0, 3, 1),
  Rating(0, 5, 2),
  Rating(1, 2, 2)).toDF("userId", "movieId", "rating")
val Array(training, testing) = ratings.randomSplit(Array(0.8, 0.2))

// Make sure that the RDDs have at least one record
assert(training.count > 0)
assert(testing.count > 0)

import org.apache.spark.ml.recommendation.ALSModel
val model = als.fit(training)

// drop NaNs
model.setColdStartStrategy("drop")
val predictions = model.transform(testing)

import org.apache.spark.ml.evaluation.RegressionEvaluator
val evaluator = new RegressionEvaluator().
  setMetricName("rmse").  // root mean squared error
  setLabelCol("rating").
  setPredictionCol("prediction")
val rmse = evaluator.evaluate(predictions)
println(s"Root-mean-square error = $rmse")

// Model is ready for recommendations

// Generate top 10 movie recommendations for each user
val userRecs = model.recommendForAllUsers(10)
userRecs.show(truncate = false)

// Generate top 10 user recommendations for each movie
val movieRecs = model.recommendForAllItems(10)
movieRecs.show(truncate = false)

// Generate top 10 movie recommendations for a specified set of users
// Use a trick to make sure we work with the known users from the input
val users = ratings.select(als.getUserCol).distinct.limit(3)
val userSubsetRecs = model.recommendForUserSubset(users, 10)
userSubsetRecs.show(truncate = false)

// Generate top 10 user recommendations for a specified set of movies
val movies = ratings.select(als.getItemCol).distinct.limit(3)
val movieSubSetRecs = model.recommendForItemSubset(movies, 10)
movieSubSetRecs.show(truncate = false)

System.exit(0)

ALS Example

// Based on JavaALSExample from the official Spark examples

// https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java

// 1. Save the code to als.scala

// 2. Run `spark-shell -i als.scala`

import spark.implicits._

import org.apache.spark.ml.recommendation.ALS

val als = new ALS().

setMaxIter(5).

setRegParam(0.01).

setUserCol("userId").

setItemCol("movieId").

setRatingCol("rating")

import org.apache.spark.ml.recommendation.ALS.Rating

// FIXME Use a much richer dataset, i.e. Spark's data/mllib/als/sample_movielens_ratings.txt

// FIXME Load it using spark.read

val ratings = Seq(

Rating(0, 2, 3),

Rating(0, 3, 1),

Rating(0, 5, 2),

Rating(1, 2, 2)).toDF("userId", "movieId", "rating")

val Array(training, testing) = ratings.randomSplit(Array(0.8, 0.2))

// Make sure that the RDDs have at least one record

assert(training.count > 0)

assert(testing.count > 0)

import org.apache.spark.ml.recommendation.ALSModel

val model = als.fit(training)

// drop NaNs

model.setColdStartStrategy("drop")

val predictions = model.transform(testing)

import org.apache.spark.ml.evaluation.RegressionEvaluator

val evaluator = new RegressionEvaluator().

setMetricName("rmse"). // root mean squared error

setLabelCol("rating").

setPredictionCol("prediction")

val rmse = evaluator.evaluate(predictions)

println(s"Root-mean-square error = $rmse")

// Model is ready for recommendations

// Generate top 10 movie recommendations for each user

val userRecs = model.recommendForAllUsers(10)

userRecs.show(truncate = false)

// Generate top 10 user recommendations for each movie

val movieRecs = model.recommendForAllItems(10)

movieRecs.show(truncate = false)

// Generate top 10 movie recommendations for a specified set of users

// Use a trick to make sure we work with the known users from the input

val users = ratings.select(als.getUserCol).distinct.limit(3)

val userSubsetRecs = model.recommendForUserSubset(users, 10)

userSubsetRecs.show(truncate = false)

// Generate top 10 user recommendations for a specified set of movies

val movies = ratings.select(als.getItemCol).distinct.limit(3)

val movieSubSetRecs = model.recommendForItemSubset(movies, 10)

movieSubSetRecs.show(truncate = false)

System.exit(0)

Alternating Least Squares (ALS) Matrix Factorization

Alternating Least Squares (ALS) Matrix Factorization for Recommender Systems

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部