Models trained on 110 million large-scale datasets #372

chaodreaming · 2024-03-11T06:42:54Z

Dear Seniors, I trained a model for 110 million datasets, and although the model didn't converge completely, the results have met the requirements of most scenarios. Unlike Latex-OCR, I referenced the images to yolov8 for processing, scaled them uniformly, and used detection to differentiate multi-line formulas, which is more suitable for real-world scenarios, and provided the onnx model directly, considering that formulas of such a large scale are not trainable by the general public.
For more information about the code Via the code

https://github.com/chaodreaming/Simple-LaTeX-OCR
For some other features like gui etc, I can't take on such a large amount of work alone, so you can consider merging them if you want to
If it's useful for your work, please ⭐ my repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models trained on 110 million large-scale datasets #372

Models trained on 110 million large-scale datasets #372

chaodreaming commented Mar 11, 2024

Models trained on 110 million large-scale datasets #372

Models trained on 110 million large-scale datasets #372

Comments

chaodreaming commented Mar 11, 2024