Collaborative and hybrid recommendation systems

University of Buenos Aires
Faculty of Exact and natural sciences
Master in Data Mining and Knowledge Discovery

Collaborative and hybrid recommendation systems

This study aims to compare different approaches to recommendation based on collaborative and hybrid filtering (i.e., a combination of collaborative and content-based filters), explaining the advantages and disadvantages of each approach, as well as their architecture and operation for each proposed model. In the realm of hybrid models or ensembles, experiments were conducted with ensembles of different types including LLM(Large language models), content-based models, and collaborative filtering-based models. The MovieLens and TMDB datasets were chosen as the basis for defining a dataset, as they are classic datasets commonly used for comparing recommendation models.

Requisites

anaconda / miniconda / mamba
mongodb
chromadb
airflow
mongosh (Optional)
Studio3T (Optional)
Postman (Optional)
6/10GB GPU to have reasonable execution times (Optional)

Hypothesis

Do deep learning-based models achieve better results than non-deep learning-based models? What are the advantages and disadvantages of each approach?
How can the cold-start problem be solved in a collaborative filtering-based recommendation approach? Any proposed solutions?

Documents

Specialization: Collaborative recommendation systems
Thesis (In progress)

Models

The following are the models to be compared. For more details, it is recommended to refer to the thesis document in the previous section.

Memory based CF: Baseline or reference model.
- KNN (Cosine Distance)
- User-Based.
- Item-Based.
- Ensemble User/Item-Based.
Model-Based CF: Collaborative filter models based on neural networks.
- Generalized Matrix Factorization (GMF): User/Item embeddings dot product.
- Biased Generalized Matrix Factorization (B-GMF): User/Item embeddings dot product + user/item biases.
- Neural Network Matrix Factorization: User/Item Embedding + flatten + Fully Connected.
- Deep Factorization Machine
*Ensembles
- Content-based and Collaborative-based models Stacking.
- Feature Weighted Linear Stacking.
- Multi-Bandit approach based on beta distribution.
- LLM's + Collaborative filtering ensemble.

Metrics

To compare collaborative filtering models, the metrics Mean Average Precision at k (mAP@k) y Normalized Discounted Cumulative Gain At K (NDCG@k) are used. Ratings between 4 and 5 points belong to the positive class, and the rest belong to the negative class.

Other metrics used:

FBetaScore@K
Precision@K
Recall@K
RMSE

Data

To conduct the necessary tests with both collaborative filtering (CF) and content-based (CB) approaches, we need:

Ratings of each item (movies) by the users (CF).
Item-specific features (CB).

Based on these requirements, the following datasets were combined:

MovieLens 25M Dataset: It has practically no information about the movies, but it does have user ratings.
TMDB Movie Dataset: It does not have personalized ratings like the previous dataset, but it has several features corresponding to the movies or items which will be necessary when training content-based models.

Notebooks

Data pre-processing & analysis

Recommendation Models

Ensembles

Extras

Multi-categorical variable embedding module

Getting started

Edit & run notebooks

Step 1: Clone repo.

$ git clone https://github.com/adrianmarino/thesis-paper.git
$ cd thesis-paper

Step 2: Create environment.

$ conda env create -f environment.yml

See notebooks in jupyter lab

Step 1: Enable project environment.

$ conda activate thesis

Step 2: Under the project directory boot jupyter lab.

$ jupyter lab

Jupyter Notebook 6.1.4 is running at:
http://localhost:8888/?token=45efe99607fa6......

Step 3: Go to http://localhost:8888.... as indicated in the shell output.

Build dataset

To carry out this process, it is necessary to have MongoDB database engine installed and listen into localhost:27017 which is the default host & port for a homemade installation. For more instructions see:

Now is necessary to run the next two notebooks in order:

This creates two files in datasets path:

movies.json
interactions.json

These files conform to the project dataset and are used for all notebooks.

Recommendation Chatbot API

A chatbot API that recommends movies based on a user's text request, their profile data, and ratings. Papers on which the chatbot was based:

Install as systemd service

Objetives

Install cha-bot-api as a systemd daemon.
Run daemon with your regular user.

Note: systemd is an initialization and service management system for Unix-like operating systems. It is responsible for starting the system and managing the running processes and services. systemd has replaced traditional initialization systems like SysV init in many Linux distributions due to its greater efficiency and advanced features.

Setup

Step 1: Copy chat-bot-api.service to user system config path:

$ cp chat-bot-api/chat-bot-api.service ~/.config/systemd/user/

Step 2: Refresh systemd daemon with updated config.

$ systemctl --user daemon-reload

Step 3: Start chat-bot-api daemon on boot.

$ systemctl --user enable chat-bot-api

Step 6: Start chat-bot-api as systemd daemon.

$ systemctl --user start chat-bot-api

Step 7: Check chat-bot-api health.

$ chat-bot-api/bin/./health

{
   "airflow" : {
      "metadatabase" : true,
      "scheduler" : true
   },
   "chatbot_api" : true,
   "ollama_api" : true,
   "choma_database" : true,
   "mongo_database" : true
}

Config file

config.conf:

# -----------------------------------------------------------------------------
# Python
# -----------------------------------------------------------------------------
CONDA_PATH="/opt/miniconda3"
CONDA_ENV="thesis"
# -----------------------------------------------------------------------------
#
#
#
# -----------------------------------------------------------------------------
# API
# -----------------------------------------------------------------------------
HOME_PATH="$(pwd)"
PARENT_PATH="$(dirname "$HOME_PATH")"
SERVICE_NAME="Recommendation ChatBot API"
PROCESS_NAME="uvicorn"
export API_HOST="0.0.0.0"
export API_PORT="8080"
# -----------------------------------------------------------------------------
#
#
#
# -----------------------------------------------------------------------------
# Mongo DB
# -----------------------------------------------------------------------------
export MONGODB_DATABASE="chatbot"
export MONGODB_HOST="0.0.0.0"
export MONGODB_PORT="27017"
export MONGODB_URL="mongodb://$MONGODB_HOST:$MONGODB_PORT"
# -----------------------------------------------------------------------------
#
#
#
# -----------------------------------------------------------------------------
# Chroma DB
# -----------------------------------------------------------------------------
export CHROMA_HOST="0.0.0.0"
export CHROMA_PORT="9090"
# -----------------------------------------------------------------------------
#
#
#
# -----------------------------------------------------------------------------
# Training Jobs
# -----------------------------------------------------------------------------
export TMP_PATH="$PARENT_PATH/tmp"
export DATASET_PATH="$PARENT_PATH/datasets"
export WEIGHTS_PATH="$PARENT_PATH/weights"
export METRICS_PATH="$PARENT_PATH/metrics"
# -----------------------------------------------------------------------------
#
#
#

Register Airflow DAG

cp dags/cf_emb_update_dag.py $AIRFLOW_HOME/dags

Test API

Step 1: Create a user profile.

curl --location 'http://nonosoft.ddns.net:8080/api/v1/profiles' \
--header 'Content-Type: application/json' \
--data-raw '{
    "name": "Adrian",
    "email": "adrianmarino@gmail.com",
    "metadata": {
        "studies"        : "Engineering",
        "age"            : 42,
        "genre"          : "Male",
        "nationality"    : "Argentina",
        "work"           : "Software Engineer",
        "prefered_movies": {
            "release": {
                "from" : "1970"
            },
            "genres": [
                "thiller",
                "suspense",
                "science fiction",
                "love",
                "comedy"
            ]
        }
    }
}'

Step 2: Query supported llmmodels.

curl --location 'http://nonosoft.ddns.net:8080/api/v1/recommendations/models'

{
    "models": [
        "llama2-13b-chat",
        "llama2-7b-chat",
        "gemma-7b",
        "mistral-instruct",
        "mistral",
        "neural-chat"
    ]
}

Step 2: Ask for recommendations.

curl --location 'http://nonosoft.ddns.net:8080/api/v1/recommendations' \
--header 'Content-Type: application/json' \
--data-raw '{
    "message": {
        "author": "adrianmarino@gmail.com",
        "content": "I want see marvel movies"
    },
    "settings": {
        "llm"                                   : "gemma-7b",
        "retry"                                 : 2,
        "plain"                                 : false,
        "include_metadata"                      : false,
        "rag": {
            "shuffle"                           : true,
            "candidates_limit"                  : 50,
            "llm_response_limit"                : 50,
            "recommendations_limit"             : 5,
            "similar_items_augmentation_limit"  : 5,
            "not_seen": true
        },
        "collaborative_filtering": {
            "shuffle"                           : true,
            "candidates_limit"                  : 50,
            "llm_response_limit"                : 50,
            "recommendations_limit"             : 5,
            "similar_items_augmentation_limit"  : 5,
            "text_query_limit"                  : 5000,
            "k_sim_users"                       : 10,
            "random_selection_items_by_user"    : 0.5,
            "max_items_by_user"                 : 10,
            "min_rating_by_user"                : 3.5,
            "not_seen"                          : true
        }
    }
}'

{
    "items": [
        {
            "title": "Thor",
            "poster": "http://image.tmdb.org/t/p/w500/pIkRyD18kl4FhoCNQuWxWu5cBLM.jpg",
            "release": "2011",
            "description": "Chris hemsworth stars as the norse god of thunder, who must reclaim his rightful place on the throne and defeat an evil nemesis.",
            "genres": [
                "action",
                "adventure",
                "drama",
                "fantasy",
                "imax"
            ],
            "votes": [
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/86332/1",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/86332/2",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/86332/3",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/86332/4",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/86332/5"
            ]
        },
        {
            "title": "Avengers, The",
            "poster": "http://image.tmdb.org/t/p/w500/RYMX2wcKCBAr24UyPD7xwmjaTn.jpg",
            "release": "2012",
            "description": "Earth's mightiest heroes team up to save the world from an alien invasion in this epic superhero movie.",
            "genres": [
                "action",
                "adventure",
                "sci-fi",
                "imax"
            ],
            "votes": [
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/89745/1",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/89745/2",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/89745/3",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/89745/4",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/89745/5"
            ]
        },
        {
            "title": "Marvel One-Shot: A Funny Thing Happened on the Way to Thor's Hammer",
            "poster": "http://image.tmdb.org/t/p/w500/njrOqsmFH4pxBrhcoslqLfw2OGk.jpg",
            "release": "2011",
            "description": "Chris hemsworth stars as the norse god of thunder, who must reclaim his rightful place on the throne and defeat an evil nemesis.",
            "genres": [
                "fantasy",
                "sci-fi"
            ],
            "votes": [
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/168040/1",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/168040/2",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/168040/3",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/168040/4",
                "http://nonosoft.ddns.net:8080/api/v1/interactions/make/adrianmarino@gmail.com/168040/5"
            ]
        }
    ]
}

API Postman Collection

API Documentation

References

References
Using or based on

Name		Name	Last commit message	Last commit date
Latest commit History 656 Commits
.circleci		.circleci
.vscode		.vscode
chat-bot-api		chat-bot-api
dags		dags
database/llama2-chabot-evaluation		database/llama2-chabot-evaluation
docs		docs
images		images
lib		lib
metrics		metrics
notebooks		notebooks
ollama-models		ollama-models
rec-sys-client-lib @ 086d1be		rec-sys-client-lib @ 086d1be
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
condaenv.eh56azs_.requirements.txt		condaenv.eh56azs_.requirements.txt
coverage.svg		coverage.svg
environment.yml		environment.yml
response.json		response.json
run-tests		run-tests

adrianmarino/thesis-paper

Folders and files

Latest commit

History

Repository files navigation

Collaborative and hybrid recommendation systems

Table of Contents

Requisites

Hypothesis

Documents

Models

Metrics

Data

Notebooks

Data pre-processing & analysis

Recommendation Models

Evaluation

Baseline

Collaborative Filtering

Content Based

Ensembles

Extras

Getting started

Edit & run notebooks

See notebooks in jupyter lab

Build dataset

Recommendation Chatbot API

Install as systemd service

Objetives

Setup

Config file

Register Airflow DAG

Test API

API Postman Collection

API Documentation

References

About

Topics

Resources

Stars

Watchers

Forks

Languages