Skip to content

Paulescu/build-and-deploy-real-time-feature-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Hands-on MLOps

Build and Deploy a Real-Time Feature Pipeline

with Python 🐍⚑


Table of contents

What is a real-time feature pipeline?

Machine Learning models are as good as the input features you feed at training and inference time.

And for many real-world applications, like financial trading, these features must be generated and served as fast as possible, so the ML system produces the best predictions possible.

Generating and serving features fast is what a real-time feature pipeline does.

Cool, but how can I implement one?

Python alone is not a language designed for speed 🐒, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually writen with Java-based tools like Apache Spark or Apache Flink.

However, things are changing fast with the emergence of Rust πŸ¦€ and libraries like Bytewax 🐝 that expose a pure Python API on top of a highly-efficient language like Rust.

So you get the best from both worlds.

  • Rust's speed and performance, plus
  • Python-rich ecosystem of libraries.

So you can develop highly performant and scalable real-time pipelines, leveraging top-notch Python libraries.

πŸ¦€ + 🐝 + 🐍 = ⚑


What is this repo about?

In this repository you will learn how to develop and deploy a real-time feature pipeline in 100% Python that

  • fetches real-time trade data (aka raw data) from the Coinbase Websocket API
  • transforms trade data into OHLC data (aka features) in real-time using Bytewax, and
  • stores these features in the Hopsworks Feature Store

You will also build a dashboard using Bokeh and Streamlit to visualize the final features, in real-time.


Run the whole thing in 10 minutes

  1. Create a Python virtual environment with the project dependencies with

    $ make init
    
  2. Set your Hopsworks API key and project name variables in set_environment_variables_template.sh, rename the file and run it (sign up for free at hospworks.ai to get these 2 values)

    $ . ./set_environment_variables.sh
    
  3. To run the feature pipeline locally

    $ make run
    
  4. To spin up a Streamlit dashboard to visualize the data in real-time

    $ make frontend
    
  5. To run the feature pipeline on an AWS EC2 instance you first need to have an AWS account and the aws-cli tool installed in your local system. Then run the following command to deploy your feature pipeline onto an EC2 instance

    $ make deploy
    
  6. Feature pipeline logs are send to AWS CloudWatch. Run the following command to grab the URL where you can see the logs.

    $ make info
    
  7. To shutdown the feature pipeline on AWS and free resources run

    $ make undeploy
    

Wanna learn more Real-Time ML?

I am preparing a new hands-on tutorial where you will learn to buld a complete real-time ML system, from A to Z.

➑️ Subscribe to The Real-World ML Newsletter to be notified when the tutorial is out.

About

Develop and deploy a real-time feature pipeline in Python, using Bytewax 🐝 and Hopsworks Feature Store.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published