Introduction
This project serves as a thorough roadmap for constructing a complete data engineering pipeline. It encompasses every phase, starting from data ingestion, progressing through processing, and concluding with storage. The implementation leverages a resilient technology stack, featuring Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. To ensure simplicity of deployment and scalability, all components are containerized using Docker.
System Architecture
Technologies
- Apache Airflow
- Python
- Apache Kafka
- Apache Zookeeper
- Apache Spark
- Cassandra
- PostgreSQL
- Docker