PDF file is not closed correctly #15

Markkuuss · 2020-09-01T15:58:19Z

When I use the following command, the file is not closed correctly. For example, I cannot delete the file afterwards because the PDF file is still being used by a process.

doc = Document("samples/nonfree/mandarin.pdf")`

If I write the code as follows instead, the PDF file will be closed correctly.

with open("samples/nonfree/mandarin.pdf", 'rb') as fp:
    doc = Document(fp)

The text was updated successfully, but these errors were encountered:

ashutoshvarma · 2020-09-01T17:24:57Z

Can you specify the steps to reproduce the issue in detail, I was not able to reproduce it.
Are you on latest pyxpdf version v0.2.3 ?
Also which OS are you using ?

Markkuuss · 2020-09-01T18:00:40Z

Are you on latest pyxpdf version v0.2.3 ?

Yes, since yesterday.

Also which OS are you using ?

Windows 10 Pro 10.0.18362 Build 18362

You can use the example from the tutorial:
https://pyxpdf.readthedocs.io/en/latest/tutorial/extract_images.html

The following example throws a PermissionError [WinError 32] when deleting a file.

import os
from pyxpdf import Document
from pyxpdf.xpdf import PDFImageOutput, page_iterator

filename="test.pdf"

doc = Document(filename)
pdfimages_out = PDFImageOutput(doc)

for images in page_iterator(pdfimages_out):
    print(images)

os.remove(filename)

When opened with "with", the file is properly deleted without errors.

import os
from pyxpdf import Document
from pyxpdf.xpdf import PDFImageOutput, page_iterator

filename="test.pdf"

with open(filename, 'rb') as fp:
    doc = Document(fp)

    pdfimages_out = PDFImageOutput(doc)

    for images in page_iterator(pdfimages_out):
        print(images)

os.remove(filename)

I didn't find in the documentation a way to close the file when it is opened with doc = Document("samples/nonfree/mandarin.pdf").

ashutoshvarma · 2020-09-02T07:55:20Z

Thanks for reporting, its a windows specific issue.
When creating Document using file path, opening and closing of file descriptor is handled by libxpdf (c++ sources) and file is open with 'rbN' in windows so fd is not inherited by child processes.

As you have find, for now if you need to do additional operations on pdf file, create Document with file-like object on windows.

I didn't find in the documentation a way to close the file when it is opened with doc = Document("samples/nonfree/mandarin.pdf")

A Document releases its resources when it is garbage collected

del pdfimages_out 
del doc

Markkuuss · 2020-09-02T08:10:33Z

A Document releases its resources when it is garbage collected
del pdfimages_out 
del doc

That was also my consideration. I have also tried it as follows. Unfortunately the same exception is thrown.

import os
from pyxpdf import Document
from pyxpdf.xpdf import PDFImageOutput, page_iterator

filename="test.pdf"

doc = Document(filename)
pdfimages_out = PDFImageOutput(doc)

for images in page_iterator(pdfimages_out):
    print(images)
    
del pdfimages_out 
del doc

os.remove(filename)

ashutoshvarma · 2020-09-02T08:56:06Z

Try this,

import os
from pyxpdf import Document
from pyxpdf.xpdf import PDFImageOutput, page_iterator

filename="test.pdf"

doc = Document(filename)
pdfimages_out = PDFImageOutput(doc)

for images in page_iterator(pdfimages_out):
    print(images)
    
del pdfimages_out 
del doc

import gc
gc.collect()

os.remove(filename)

My bad, actually, del is just decreasing refcount by 1, but with Document we have a a reference cycle.
They are not immediately deallocated. At regular times, the garbage collector runs, which will notice the reference cycle (using the tp_traverse slot) and break it.

I think we should gc.collect() inside Document deallocator so that we don't have to wait for gc to clear it. I will create a separate issue for this.

Markkuuss · 2020-09-02T10:28:04Z

I think we should gc.collect() inside Document deallocator so that we don't have to wait for gc to clear it. I will create a separate issue for this.

Yeah, you're right. This works.

ashutoshvarma added the bug Something isn't working label Sep 2, 2020

ashutoshvarma self-assigned this Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF file is not closed correctly #15

PDF file is not closed correctly #15

Markkuuss commented Sep 1, 2020

ashutoshvarma commented Sep 1, 2020

Markkuuss commented Sep 1, 2020

ashutoshvarma commented Sep 2, 2020 •

edited

Markkuuss commented Sep 2, 2020

ashutoshvarma commented Sep 2, 2020

Markkuuss commented Sep 2, 2020

PDF file is not closed correctly #15

PDF file is not closed correctly #15

Comments

Markkuuss commented Sep 1, 2020

ashutoshvarma commented Sep 1, 2020

Markkuuss commented Sep 1, 2020

ashutoshvarma commented Sep 2, 2020 • edited

Markkuuss commented Sep 2, 2020

ashutoshvarma commented Sep 2, 2020

Markkuuss commented Sep 2, 2020

ashutoshvarma commented Sep 2, 2020 •

edited