LlamaParse messes up ordering of two-column PDFs #146

qniksefat · 2024-04-16T09:41:08Z

Hey,

I'm having a hard time parsing pdf files with two vertical columns filled with text. It actually sometimes captures the right order, but often does not. I'm parsing it into markdown.

For example, it parses one sentence from left and one from the right column. It does not break it between the sentence.

Thanks!

ah3243 · 2024-04-26T10:51:27Z

yep me too, parsing academic documents is really unreliable with any parser currently. If you're trying to use it with academic documents as well then many conferences also have a html format which if you use instead is straight forward to use as an input.

PowerOwner · 2024-04-29T13:10:21Z

10-K 2023, 09.30.2023-2023-11-02-08-16.json

10-K 2023, 09.30.2023-2023-11-02-08-16.pdf

There is a company that can solve multiple columns of vertical text, and it works particularly well on tables. And his speed is particularly fast, 100 pages <= 5s processing completed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlamaParse messes up ordering of two-column PDFs #146

LlamaParse messes up ordering of two-column PDFs #146

qniksefat commented Apr 16, 2024

ah3243 commented Apr 26, 2024

PowerOwner commented Apr 29, 2024

LlamaParse messes up ordering of two-column PDFs #146

LlamaParse messes up ordering of two-column PDFs #146

Comments

qniksefat commented Apr 16, 2024

ah3243 commented Apr 26, 2024

PowerOwner commented Apr 29, 2024