RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
-
Updated
May 12, 2024 - Python
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Every thing about designing installing and implementing data pipelines to include kafka zookeeper hadoop If you enjoy my content please consider supporting what I do Thank you.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
An orchestration platform for the development, production, and observation of data assets.
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
A Realtime Seismic Logging & Alerts Service with Live Monitoring & Email Alerts made using Kafka Data Pipelines, all Dockerized & Deployment Ready!
Lean and mean distributed stream processing system written in rust and web assembly.
The framework for fast development and deployment of RAG systems.
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.
Dataform is a framework for managing SQL based data operations in BigQuery
Move your data with ease.
One framework to develop, deploy and operate data workflows with Python and SQL.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Add a description, image, and links to the data-pipelines topic page so that developers can more easily learn about it.
To associate your repository with the data-pipelines topic, visit your repo's landing page and select "manage topics."