Skip to content

kennethleungty/Anomaly-Detection-Pipeline-Kedro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building and Managing an Isolation Forest Anomaly Detection Pipeline with Kedro

Overview

Anomaly (fraud) detection pipeline on credit card transaction data using Isolation Forest machine learning model and Kedro framework

Link to article: https://neptune.ai/blog/data-science-pipelines-with-kedro

Objective

Develop a data science pipeline to detect anomalous (fradulent) credit card transactions with the use of:

  • Isolation Forest machine learning model - For unsupervised anomaly detection
  • Kedro - An open-source Python framework for creating reproducible, maintainable, and modular data science code. This framework helps to accelerate data pipelining, enhance data science prototyping, and promote pipeline reproducibility.)

Motivation

  • Explore how unsupervised anomaly detection works, and better understand the concept and implementation of isolation forest
  • Leverage Kedro framework to optimally structure data science pipeline projects

Data

The credit card transaction data is obtained from the collaboration between Worldline and Machine Learning Group. It is a realistic simulation of real-world credit card transactions and has been designed to include complicated fraud detection issues.

General Pipeline Structure

Alt text

Anomaly Detection Pipeline Structure

Alt text

Steps

  1. Change path to project directory in command line - cd C:/Anomaly-Detection-Pipeline-Kedro
  2. Initialize Conda virtual environment (create one if not done so) - conda activate env_kedro
  3. Execute a pipeline run with kedro run

Please see the walkthrough article for details