RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
-
Updated
May 23, 2024 - Python
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
A Repo For Document AI
ReadingBank: A Benchmark Dataset for Reading Order Detection
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.
Run optical character recognition with PyTesseract from the FiftyOne App!
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
Datasets and Evaluation Scripts for CompHRDoc
Official release of RFUND introduced in the paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction" (arXiv:2401.03472).
QuickCapture Mobile Scanning SDK Specially designed for native ANDROID from Extrieve
TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning
Checkbox Detection Model for Scanned Documents
A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.
Object Detection Model for Scanned Documents
This project tackles a real-world challenge of automating client document processing, with a focus on enhancing document classification, error detection, data extraction, and validation.
Add a description, image, and links to the document-understanding topic page so that developers can more easily learn about it.
To associate your repository with the document-understanding topic, visit your repo's landing page and select "manage topics."