Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LlamaParse messes up ordering of two-column PDFs #146

Open
qniksefat opened this issue Apr 16, 2024 · 2 comments
Open

LlamaParse messes up ordering of two-column PDFs #146

qniksefat opened this issue Apr 16, 2024 · 2 comments

Comments

@qniksefat
Copy link

Hey,

I'm having a hard time parsing pdf files with two vertical columns filled with text. It actually sometimes captures the right order, but often does not. I'm parsing it into markdown.

For example, it parses one sentence from left and one from the right column. It does not break it between the sentence.

Thanks!

@ah3243
Copy link

ah3243 commented Apr 26, 2024

yep me too, parsing academic documents is really unreliable with any parser currently. If you're trying to use it with academic documents as well then many conferences also have a html format which if you use instead is straight forward to use as an input.

@PowerOwner
Copy link

10-K 2023, 09.30.2023-2023-11-02-08-16.json

10-K 2023, 09.30.2023-2023-11-02-08-16.pdf

image

There is a company that can solve multiple columns of vertical text, and it works particularly well on tables. And his speed is particularly fast, 100 pages <= 5s processing completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants