Netflix Prize

  • Given the 1 to 5 star grade of some films by a set of users <user, movie, date of grade, grade> [training set]
  • Predict the star grade of other films by the set of users <user, movie, date of grade > [qualifying set]
  • Performance metric: RMSE of the predicted grades ( you are allowed to submit non-integral grades, but all the actual grades are integral values)
    • The current Netflix algorithm has RMSE of 0.9514
  • The "jury" will tell you the performance of your algorithm on half the qualifying set (the [quiz set]) when you submit.
    • You can submit once a day.
  • The prize is to be beat the current Netflix algorithm by 10%. They used to give progress prizes for teams beating last year's best algorithm by 1%, although the current best algorithm from 2008 beats it by 9.44% (RMSE of 0.8616)

Translating to DiGeneI:

  • Treat each of the 5 star grades as a "disease"/partition category, and each film as a feature.
  • I've seen mappings of the movies to IMDB, to yield other features such as presence of actors, directors, publication year.
  • It should be noted that the current top competitors for the Netflix prize eschew metadata and go with the "pure" approach of only using the supplied Netflix data and not joining it with any other data.
  • Keyword I see a lot is "Collaborative Filtering" (CF), which has some similarities to my current approach. CF to predict the disease score for a particular gene would be:
    • Use profiles to find similar genes (the secondary profile comparison bit)
    • Merging the prediction of these other genes for the disease in question.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.