Data Engineering Nanodegree

Projects of Udacity's Data Engineering Nanodegree.

Projects

Data Modeling

In this course, I’ll learn to create relational and NoSQL data models to fit the diverse needs of data consumers. I’ll understand the differences between different data models, and how to choose the appropriate data model for a given situation. I’ll also build fluency in PostgreSQL and Apache Cassandra.

Course Project 1 - Data Modeling with Postgres

In this project, I’ll model user activity data for a music streaming app called Sparkify. I’ll create a relational database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. In PostgreSQL I will also define Fact and Dimension tables and insert data into new tables.

Course Project 2 - Data Modeling with Apache Cassandra

In these projects, I’ll model user activity data for a music streaming app called Sparkify. I'll create a database and ETL pipeline, in both Postgres and Apache Cassandra, designed to optimize queries for understanding what songs users are listening to. For PostgreSQL, I will also define Fact and Dimension tables and insert data into new tables. For Apache Cassandra, I'll model data so I can run specific queries provided by the analytics team at Sparkify.

Cloud Data Warehouses

Course Project 3 - Data Modeling with AWS Redshift

In this project, I applied what I've learned on data warehouses and AWS to build an ETL pipeline for a database hosted on Redshift. To complete the project, I need to load data from S3 to staging tables on Redshift and execute SQL statements that create the analytics tables from these staging tables. To manage the AWS and manage the clusters and access, I used the AWS SDK for Python.

Data Lakes with Spark

Course Project 4 - Data Lake with Apache Spark and AWS S3

In this project, I applied what learned on Spark and data lakes to build an ETL pipeline for a data lake hosted on S3. To complete the project, I need to load data from S3, process the data into analytics tables using Spark, and load them back into S3. After, I deployed this Spark process on a cluster using AWS. I used the AWS SDK for Python.

Data Pipelines with Airflow

Course Project 5 - Data Pipelines with Apache Airflow

In this project, I applied what I've learned on Apache Airflow data pipelines. To complete the project, I need to create your own custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step. I used the AWS SDK for Python.

Data Engineering Capstone

Capstone Project - Data Platform for Analytics & Machine Learning - Financial companies complaints analysis

In this project, I applied what I've learned on Udacity Nanodegrees.

Acknowledgements

Data Engineering Nanodegree Program Syllabus

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
apache-airflow-data-pipelines		apache-airflow-data-pipelines
apache-cassandra-data-modeling		apache-cassandra-data-modeling
apache-spark-data-lake		apache-spark-data-lake
aws-data-warehouse-modeling		aws-data-warehouse-modeling
capstone-project		capstone-project
mapreduce-modeling		mapreduce-modeling
postgres-data-modeling		postgres-data-modeling
sakila-data-modeling		sakila-data-modeling
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apache-airflow-data-pipelines

apache-airflow-data-pipelines

apache-cassandra-data-modeling

apache-cassandra-data-modeling

apache-spark-data-lake

apache-spark-data-lake

aws-data-warehouse-modeling

aws-data-warehouse-modeling

capstone-project

capstone-project

mapreduce-modeling

mapreduce-modeling

postgres-data-modeling

postgres-data-modeling

sakila-data-modeling

sakila-data-modeling

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Data Engineering Nanodegree

Projects

Data Modeling

Cloud Data Warehouses

Data Lakes with Spark

Data Pipelines with Airflow

Data Engineering Capstone

Acknowledgements

About

Releases

Packages

Languages

dacosta-github/udacity-de

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Nanodegree

Projects

Data Modeling

Cloud Data Warehouses

Data Lakes with Spark

Data Pipelines with Airflow

Data Engineering Capstone

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Languages