MLOps zoomcamp Project - Cohort 2023

1. Problem description

The ABC Multistate bank has churn problem, also known as a customer churn problem, is a machine learning problem focused on predicting whether a customer is likely to leave (churn) or stay with a bank based on historical data. Churn refers to the process by which customers discontinue their relationship with a company or service, and in the context of a bank, it means customers closing their accounts and moving to another bank.

Problem type: Supervised/Classification

Dataset

The dataset was found as a Kaggle dataset. Sample data:

customer_id	credit_score	country	gender	age	tenure	balance	products_number	credit_card	active_member	estimated_salary	churn
15634602	619	France	Female	42	2	0	1	1	1	101348.88	1
15647311	608	Spain	Female	41	1	83807.86	1	0	1	112542.58	0
15619304	502	France	Female	42	8	159660.8	3	1	0	113931.57	1
15701354	699	France	Female	39	1	0	2	0	0	93826.63	0
15737888	850	Spain	Female	43	2	125510.82	1	1	1	79084.1	0

Proposed Solution

As a machine learning problem, the goal is to build a predictive model that can accurately identify customers who are at risk of churning. This model can help banks take proactive measures to retain valuable customers by offering targeted incentives, personalized services, or early intervention strategies.

Solution type: batch deployment for the model tranining and inference.

2. Cloud

The tech stack used:

The project uses:

And th VM used for the project (AWS EC2 instance):

We use Makefile to reproduce the needed environment in any infrastructure.

SHELL=/bin/bash

build-environment-and-services:
	@echo "Building Python environment"
	pip install pipenv &&\
	pipenv install
	@echo "Running MLFlow Server on localhost:5000"
	rm -rf mlflow.db mlruns/ &&\
	nohup mlflow server \
		--backend-store-uri sqlite:///mlflow.db \
		--default-artifact-root ./artifacts \
		--host localhost:5000 &
		
	@echo "Deploying Prefect Server on localhost:4200"
	nohup prefect server start &
	@echo "Deploying Monitoring Service"
	docker-compose  -f monitoring/docker-compose.yml up -d
	@echo "Deploying Grafana on localhost:3000"
	@echo "Deploying Adminer on localhost:8080"
	@echo "The local environment is ready to be used."

Execute entire environment:

make

Next steps - ML platform (WIP)

A machine learning (ML) platform interface for deploying machine learning models using the stack of AWS, Flask, MLflow, and Prefect would provide a seamless and scalable solution for model deployment and management. Let's break down the components of the platform:

AWS (Amazon Web Services): AWS is a cloud computing service that offers a wide range of tools and services to build, deploy, and manage applications. In the context of the ML platform, AWS will provide the infrastructure and services for hosting the platform components, managing data, and deploying machine learning models.

Flask: Flask is a lightweight and flexible web framework for Python. It will be used to create the backend of the ML platform, handling HTTP requests and responses. Flask allows easy integration with other Python libraries and will serve as the API layer to interact with the ML models.

MLflow: MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It allows data scientists to track and version their experiments, package and deploy models, and manage model deployments. MLflow also provides tools for model registry and collaboration between team members.

Prefect: Prefect is a workflow management system that helps in orchestrating complex data workflows, including ML model training, evaluation, and deployment. It provides a way to define, schedule, and monitor workflows, making it easier to automate and manage the deployment pipeline for machine learning models.

UI for deployment:

3. Experiment Tracking and model registry

For experiment tracking and model registry we use mlflow

Register model

Promote best model to Production

4. Workflow orchestration

We use prefect for orchestration in:

Model training src/training_pipeline.py
Model inference (Predict new data) src/batch_scoring_pipeline.py )
Model monitoring (Calculate drift, and model performance) src/monitor_ml_churn_model.py

5. Model Deployment

Deployment is done via Makefile + Dockerfile

Steps

clone the repository git clone https://github.com/abdala9512/mlops-zoomcamp-project-2023.git
Execute Makefile to create services make
Execute model training pipeline python src/training_pipeline.py
Promote any model to PROD in Mlflow
Execute scoring pipeline python src/batch_scoring_pipeline.py
Execute Monitoring pipeline python src/monitor_ml_churn_model.py

6. Model monitoring

Machine learning monitoring with Grafana and Postgres involves using these two tools to track, visualize, and analyze the performance and behavior of machine learning models deployed in production. Let's break down how each component contributes to the monitoring process:

Machine Learning Models in Production: When machine learning models are deployed in a production environment, they interact with real-world data, and their performance and behavior may change over time. Monitoring these models is essential to ensure they continue to make accurate predictions and maintain their desired performance.

Grafana: Grafana is an open-source data visualization and monitoring tool. It allows you to create interactive and customizable dashboards to visualize and analyze data from various sources, including databases, APIs, and monitoring systems. Grafana is highly extensible and supports numerous data sources, making it suitable for integrating with different monitoring and logging tools.

Postgres (PostgreSQL): Postgres is an open-source, powerful relational database management system (RDBMS). It is often used to store data from applications, including machine learning models. Postgres is known for its performance, scalability, and support for complex queries.

Adminer data explorer (PostgreSQL database)

Grafana

Reproducibility

Run Makefile (Local or cloud)
Execute docker files

echo "Build dockerfile"
docker build -t customer_churn_ml_pipeline ./src/deployment
docker run -v $(pwd):/app/ -it customer_churn_ml_pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
assets		assets
data		data
iac		iac
ml-platform/form-app		ml-platform/form-app
monitoring		monitoring
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prefectignore		.prefectignore
LICENSE		LICENSE
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
prefect.yaml		prefect.yaml

License

abdala9512/mlops-zoomcamp-project-2023

Folders and files

Latest commit

History

Repository files navigation

MLOps zoomcamp Project - Cohort 2023

1. Problem description

Dataset

Proposed Solution

2. Cloud

Next steps - ML platform (WIP)

3. Experiment Tracking and model registry

4. Workflow orchestration

5. Model Deployment

Steps

6. Model monitoring

Reproducibility

About

Topics

Resources

License

Stars

Watchers

Forks

Languages