You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having a hard time parsing pdf files with two vertical columns filled with text. It actually sometimes captures the right order, but often does not. I'm parsing it into markdown.
For example, it parses one sentence from left and one from the right column. It does not break it between the sentence.
Thanks!
The text was updated successfully, but these errors were encountered:
yep me too, parsing academic documents is really unreliable with any parser currently. If you're trying to use it with academic documents as well then many conferences also have a html format which if you use instead is straight forward to use as an input.
There is a company that can solve multiple columns of vertical text, and it works particularly well on tables. And his speed is particularly fast, 100 pages <= 5s processing completed
Hey,
I'm having a hard time parsing pdf files with two vertical columns filled with text. It actually sometimes captures the right order, but often does not. I'm parsing it into markdown.
For example, it parses one sentence from left and one from the right column. It does not break it between the sentence.
Thanks!
The text was updated successfully, but these errors were encountered: