Data Practitioner

_{Built with ❤︎ by Anantha Raju C and contributors}

Explore the docs »

Report Bug · Request Feature

Service	Badge	Badge	Badge	Badge	Badge
GitHub
GitHub

This GitHub project is a data engineering and analytics pipeline designed to handle the end-to-end process of extracting, transforming, and loading data from MySQL source into ClickHouse. The pipeline is orchestrated using Dagster, a data orchestrator that provides a unified workflow for managing data pipelines.

The combination of Dagster, ClickHouse, DBT Core, and MySQL ensures a well-structured and maintainable architecture for end-to-end data processing.

Key Components:

Dagster: The core orchestrator that manages the workflow of the entire data pipeline. Dagster allows for the definition, scheduling, and monitoring of data workflows, ensuring reliability and scalability.
ClickHouse: A columnar database used as the data warehouse for efficient storage and retrieval of large volumes of data. ClickHouse is optimized for analytical queries, making it suitable for data analytics and reporting.
DBT Core: The data transformation layer that leverages the popular DBT (Data Build Tool) framework. DBT Core facilitates the transformation of raw data into a structured and meaningful format for analytics and reporting.
MySQL: Used for data extraction and as a source database. MySQL plays a crucial role in the initial phase of the pipeline.

Workflow:

Data Extraction: Raw data is extracted from MySQL databases, serving as source systems. This could include data from various operational databases.
Loading: The transformed data is loaded into ClickHouse, the designated data warehouse, where it is stored efficiently for analytical queries and reporting.
Transformation: DBT Core processes and transforms the raw data into a clean, structured format suitable for analytics. Transformations may include aggregations, joins, and other operations to derive insights.
Analytics: Once the data is in ClickHouse, analysts and data scientists can perform analytics and generate insights using SQL queries or other analytical tools.

How to Use:

Detailed documentation and instructions on setting up and configuring the pipeline are available in the project repository. Users can follow the guidelines to adapt the pipeline to their specific data sources and analytics requirements.

Details

Reporting Issues/Suggest Improvements

This Project uses GitHub's integrated issue tracking system to record bugs and feature requests. If you want to raise an issue, please follow the recommendations below:

Before you log a bug, please search the issue tracker to see if someone has already reported the problem.
If the issue doesn't already exist, create a new issue
Please provide as much information as possible with the issue report.
If you need to paste code, or include a stack trace use Markdown +++```+++ escapes before and after your text.

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Kindly refer to CONTRIBUTING.md for important Pull Request Process details

In the top-right corner of this page, click Fork.
Clone a copy of your fork on your local, replacing YOUR-USERNAME with your Github username.

git clone https://github.com/YOUR-USERNAME/DataPractitioner.git
Create a branch:

git checkout -b <my-new-feature-or-fix>
Make necessary changes and commit those changes:

git add .

git commit -m "new feature or fix"
Push changes, replacing <add-your-branch-name> with the name of the branch you created earlier at step #3. :

git push origin <add-your-branch-name>
Submit your changes for review. Go to your repository on GitHub, you'll see a Compare & pull request button. Click on that button. Now submit the pull request.

That's it! Soon I'll be merging your changes into the master branch of this project. You will get a notification email once the changes have been merged. Thank you for your contribution.

Kindly follow Conventional Commits to create an explicit commit history. Kindly prefix the commit message with one of the following type's.

build : Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
ci : Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
docs : Documentation only changes
feat : A new feature
fix : A bug fix
perf : A code change that improves performance
refactor: A code change that neither fixes a bug nor adds a feature
style : Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
test : Adding missing tests or correcting existing tests

License

Distributed under the MIT License. See LICENSE.md for more information.

The End

In the end, I hope you enjoyed the application and find it useful, as I did when I was developing it to learn.

If you would like to enhance, please:

Open PRs,
Give feedback,
Add new suggestions, and
Finally, give it a 🌟.
Happy Coding ...* 🙂

Contact

Anantha Raju C - @anantharajuc - arcswdev@gmail.com

Project Link: https://github.com/AnanthaRajuC/DataPractitioner

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
dbt_data_practitioner		dbt_data_practitioner
documents		documents
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/ISSUE_TEMPLATE

.github/ISSUE_TEMPLATE

dbt_data_practitioner

dbt_data_practitioner

documents

documents

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

SECURITY.md

SECURITY.md

Repository files navigation

Data Practitioner

Details

Reporting Issues/Suggest Improvements

Contributing

License

The End

Contact

About

Releases

Packages

Languages

License

AnanthaRajuC/DataPractitioner

Folders and files

Latest commit

History

Repository files navigation

Data Practitioner

Details

Reporting Issues/Suggest Improvements

Contributing

License

The End

Contact

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages