Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when opening an invalid PDF with no %%EOF in tail #3344

Closed
cmyers009 opened this issue Apr 4, 2024 · 5 comments
Closed

Memory leak when opening an invalid PDF with no %%EOF in tail #3344

cmyers009 opened this issue Apr 4, 2024 · 5 comments

Comments

@cmyers009
Copy link

Description of the bug

If the file bytes are prematurely cut-off, then fitz will open the PDF file with 0 pages, but at the same time, cause a memory leak.

How to reproduce the bug

You can reproduce this bug by taking a large PDF file, and remove the last 50% of the bytes.

If you repeatably load files like this, there will be a memory leak even with a doc.close()

You can add a check if the file has an %%EOF with this code. If you call it before the doc.open() code, then you can return 0 pages without the need to produce the memory leak.

def has_eof_marker(file_path): try: with open(file_path, 'rb') as file: # Seek to the last 1KB of the file file.seek(-1024, os.SEEK_END) # Read the last 1KB tail = file.read() # Check if %%EOF is in the last 1KB return b'%%EOF' in tail except Exception as e: print(f"Error reading file: {e}") return False

    PyMuPDF 1.23.4

PyMuPDF version

1.23.8 or earlier

Operating system

Windows

Python version

3.11

@julian-smith-artifex-com
Copy link
Collaborator

I cannot reproduce this with the current version, PyMuPDF-1.24.1. What version of PyMuPDF are you using?

@cmyers009
Copy link
Author

cmyers009 commented Apr 4, 2024 via email

@julian-smith-artifex-com
Copy link
Collaborator

There have been quite a few improvements to memory handling since 1.23.4 and so it would be worth retrying with the latest version, 1.24.1.

@cmyers009
Copy link
Author

cmyers009 commented Apr 5, 2024 via email

@julian-smith-artifex-com
Copy link
Collaborator

Closing this because waiting for information for over a month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants