Skip to content

This project serves as a thorough roadmap for constructing a complete data engineering pipeline. The implementation leverages a resilient technology stack, featuring Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. all components are containerized using Docker.

Notifications You must be signed in to change notification settings

morshed-sarwer/realtime-data-streaming-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

realtime-data-streaming-pipeline

Introduction

This project serves as a thorough roadmap for constructing a complete data engineering pipeline. It encompasses every phase, starting from data ingestion, progressing through processing, and concluding with storage. The implementation leverages a resilient technology stack, featuring Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. To ensure simplicity of deployment and scalability, all components are containerized using Docker.

System Architecture

Data engineering architecture

Technologies

  • Apache Airflow
  • Python
  • Apache Kafka
  • Apache Zookeeper
  • Apache Spark
  • Cassandra
  • PostgreSQL
  • Docker

About

This project serves as a thorough roadmap for constructing a complete data engineering pipeline. The implementation leverages a resilient technology stack, featuring Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. all components are containerized using Docker.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published