Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
-
Updated
May 17, 2024 - HTML
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
TypeScript client for Graphlit Platform
Python client library for Graphlit Platform
A Repo For Document AI
Integrate AI-powered Document Analysis Pipelines
The City of Portland distributes voter participation info in PDF format. This makes it a CSV.
Improved file parsing for LLM’s
Build a RAG preprocessing pipeline
DF Extract Lib
An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).
Graphlit Platform
Extract text from your DOCX documents.
🎓 Set of powerful tools designed to streamline the extraction, parsing, and clean-up of data from docx and pdf forms. Saves time and eliminate manual data entry by automating the processing of structured data.
The invoice, document, and résumé parser powered by AI.
Python program that uses open ai apis to parse user specified content from text files
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.
Ihugure Chatbot Streamlit User Interface
Add a description, image, and links to the document-parser topic page so that developers can more easily learn about it.
To associate your repository with the document-parser topic, visit your repo's landing page and select "manage topics."