Contributor(s): Gael Blanchard Requirements: Python(sklearn, pyspark, os) Data: AnimeList, MovieLens100K, MovieLens20M
Objective: Develop, test and evaluate a recommendation engine with pyspark on the animelist dataset.
- Given a user can we recommend titles they are likely to enjoy
- Given an anime can we recommend other anime that are like it
How it works:
Concepts Covered: Cosine Similarity, Pearson's Correlation, Collaborative Filtering, Recommendation Engine, Stratified Sampling, Train-Test Spit,Function Aliasing
Going Further: By applying sampling techniques such as Synthetic Minority Over-sampling Technique, we can deal with the common imbalanced data phenomena that is prevalent in data. Using this methodology to develop recommendation engines for larger databases.