Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_pixmap function removes the table and leaves just the content behind #3448

Open
anirudhagarwal1 opened this issue May 7, 2024 · 7 comments

Comments

@anirudhagarwal1
Copy link

Description of the bug

I have a single page pdf file which has a table inside it. When I load the pdf and try to call the get_pixmap function, it just keeps the content and removes the table around it.

pix = page.get_pixmap(alpha=False, dpi=150) image = Image.open(io.BytesIO(pix.tobytes())) image.save("temp.jpeg", format='jpeg')

Unfortunately, I won't be able to share to share this particular pdf on an open platform, would you be able to suggest how can I further debug it?

Sharing the part of screenshot of this pdf and the converted image.
PDF -
Screenshot 2024-05-08 at 1 41 06 AM

Image from it -
Screenshot 2024-05-08 at 1 42 34 AM

How to reproduce the bug

Seems to be breaking only in this particular kind of PDF. Seems to be working fine elsewhere.

PyMuPDF version

1.24.1

Operating system

MacOS

Python version

3.10

@JorjMcKie
Copy link
Collaborator

Providing the example file (not just the pictures) is mandatory for submitting a bug.

@anirudhagarwal1
Copy link
Author

Since this document contains some sensitive information, I would not able to share it on a public forum. I tried to replicate this issue with multiple other PDFs and wasn't able to.

Would you consider if I could mail it to you privately?

@JorjMcKie
Copy link
Collaborator

Since this document contains some sensitive information, I would not able to share it on a public forum. I tried to replicate this issue with multiple other PDFs and wasn't able to.

Would you consider if I could mail it to you privately?

Yes, certainly! Please do use this way.

@anirudhagarwal1
Copy link
Author

I have shared the same over your github email id - jorj.x.mckie@outlook.de

@mjun0812
Copy link

I have the same issue.
When processing a PDF of this paper, the title and table borders were removed.
https://arxiv.org/abs/2310.19909
This problem does not occur when using v1.23.26.

@JorjMcKie
Copy link
Collaborator

I have the same issue. When processing a PDF of this paper, the title and table borders were removed. https://arxiv.org/abs/2310.19909 This problem does not occur when using v1.23.26.

Please provide the link to an example PDF / page - I need it to report the bug!

@mjun0812
Copy link

@JorjMcKie
Sorry, I should have been more explicit. The following URL is the link to the PDF.
https://arxiv.org/pdf/2310.19909
Page 1, 4, 7, and 8 borders disappear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants