Build and Deploy a Real-Time Feature Pipeline

Hands-on MLOps

Build and Deploy a Real-Time Feature Pipeline

with Python 🐍⚡

What is a real-time feature pipeline?

Machine Learning models are as good as the input features you feed at training and inference time.

And for many real-world applications, like financial trading, these features must be generated and served as fast as possible, so the ML system produces the best predictions possible.

Generating and serving features fast is what a real-time feature pipeline does.

Cool, but how can I implement one?

Python alone is not a language designed for speed 🐢, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually writen with Java-based tools like Apache Spark or Apache Flink.

However, things are changing fast with the emergence of Rust 🦀 and libraries like Bytewax 🐝 that expose a pure Python API on top of a highly-efficient language like Rust.

So you get the best from both worlds.

Rust's speed and performance, plus
Python-rich ecosystem of libraries.

So you can develop highly performant and scalable real-time pipelines, leveraging top-notch Python libraries.

🦀 + 🐝 + 🐍 = ⚡

What is this repo about?

In this repository you will learn how to develop and deploy a real-time feature pipeline in 100% Python that

fetches real-time trade data (aka raw data) from the Coinbase Websocket API
transforms trade data into OHLC data (aka features) in real-time using Bytewax, and
stores these features in the Hopsworks Feature Store

You will also build a dashboard using Bokeh and Streamlit to visualize the final features, in real-time.

Run the whole thing in 10 minutes

Create a Python virtual environment with the project dependencies with
```
$ make init
```
Set your Hopsworks API key and project name variables in set_environment_variables_template.sh, rename the file and run it (sign up for free at hospworks.ai to get these 2 values)
```
$ . ./set_environment_variables.sh
```
To run the feature pipeline locally
```
$ make run
```
To spin up a Streamlit dashboard to visualize the data in real-time
```
$ make frontend
```
To run the feature pipeline on an AWS EC2 instance you first need to have an AWS account and the aws-cli tool installed in your local system. Then run the following command to deploy your feature pipeline onto an EC2 instance
```
$ make deploy
```
Feature pipeline logs are send to AWS CloudWatch. Run the following command to grab the URL where you can see the logs.
```
$ make info
```
To shutdown the feature pipeline on AWS and free resources run
```
$ make undeploy
```

Wanna learn more Real-Time ML?

I am preparing a new hands-on tutorial where you will learn to buld a complete real-time ML system, from A to Z.

➡️ Subscribe to The Real-World ML Newsletter to be notified when the tutorial is out.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
set_environment_variables_template.sh		set_environment_variables_template.sh
setup-ec2.sh		setup-ec2.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

src

src

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

set_environment_variables_template.sh

set_environment_variables_template.sh

setup-ec2.sh

setup-ec2.sh

Repository files navigation

Hands-on MLOps

Build and Deploy a Real-Time Feature Pipeline

with Python 🐍⚡

Table of contents

What is a real-time feature pipeline?

Cool, but how can I implement one?

🦀 + 🐝 + 🐍 = ⚡

What is this repo about?

Run the whole thing in 10 minutes

Wanna learn more Real-Time ML?

About

Releases

Packages

Languages

License

Paulescu/build-and-deploy-real-time-feature-pipeline

Folders and files

Latest commit

History

Repository files navigation

Hands-on MLOps

Build and Deploy a Real-Time Feature Pipeline

with Python 🐍⚡

Table of contents

What is a real-time feature pipeline?

Cool, but how can I implement one?

🦀 + 🐝 + 🐍 = ⚡

What is this repo about?

Run the whole thing in 10 minutes

Wanna learn more Real-Time ML?

About

Topics

Resources

License

Stars

Watchers

Forks

Languages