Skip to content

A complete data product project used to study concepts related to data enginerring such as modeling, collection, operations and so on.

License

Notifications You must be signed in to change notification settings

limagbz/data-mesh-yelp

Repository files navigation

Data Mesh Project

GPLv3 License

This project aims to design and implement a data mesh architecture by using close to real business data provided on the Yelp Dataset. This matches perfectly with the Data Mesh concept of modeling analytics for business. For details about the data see Yelp Dataset Documentation.

Note that this is not a production-ready project. This is rather a lab to deep my knowledge into Data Engineering, DevOps and mainly data meshs. So errors and changes will occur as my knowledge evolves. Feel free to contribute with this project by contacting me with suggestions, tips and ways that I can improve this code (see Contributing for more details)

  1. Logical Architecture
  2. Platform Architecture
  3. Setup your Local Environment
  4. Contributing
  5. References

Logical Architecture

Note

Please refer to Logical Architecture for details about the diagram. For information about each product (including their canvas and interaction map) refer to their own documentation on products folder.

Platform Architecture

Note

This architecture (and the diagram) is heavily based on the tech stacks found here, more precisely this a mix of both Datamesh Architecture: MinIO and Trino and Datamesh Architecture: dbt and Snowflake. Changes should occur as the project. Please refer to Infra README's for more information about the architecture.

Setup your local environment

In order to deploy the resources a kubernetes cluster is required. How to deploy a local kubernetes cluster is out of the scope of this project. This code was tested under a MicroK8S managed cluster. If this is your choice the following addons were enabled:

microk8s enable dns
microk8s enable helm
microk8s enable helm3
microk8s enable hostpath-storage
microk8s enable rbac
microk8s enable registry

Note

There are many solutions out there to deploy a local cluster (e.g. Minikube, Kind). You can see some examples on Kubernetes: Install Tools.

It is also required to download the Yelp Dataset (photos are not required) and extract it on the data folder. To download please follow the instruction on Yelp Dataset: Download The Data

Setup your development environment

This project embed a full-feature developer container for VSCode users containing all the tools, extensions and required configurations to develop the code. If you don't know how dev containers work please read Visual Studio Code: Developing Inside a Container.

For people that do not use VSCode the Dockerfile contains all the tools used by the project. You can use that as a base for setup your environment.

Contributing

Since this is a lab project currently I am the only person developing the code. However feel free to propose new features/improvements, ask questions, suggest tips and etc on the discussion tab. For bug reports use the issues tab (with the bug template).

Note

Please, read the CONTRIBUTING Guide for more details about styleguides, best practices and conventions followed by the project.

References

Below are some main references used by this project. Feel free to read them for a more deep understanding about the project.