Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'ValueError: buffer is not large enough' on PdfImage().extract_to() on some pngs #418

Open
ghost opened this issue Nov 10, 2022 · 0 comments

Comments

@ghost
Copy link

ghost commented Nov 10, 2022

I'm getting

    tmpFileName = pdfImage.extract_to(fileprefix = "tmp")
  File "C:\ProgramData\Anaconda3\lib\site-packages\pikepdf\models\image.py", line 668, in extract_to
    extension = self._extract_to_stream(stream=bio)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pikepdf\models\image.py", line 611, in _extract_to_stream
    im = self._extract_transcoded()
  File "C:\ProgramData\Anaconda3\lib\site-packages\pikepdf\models\image.py", line 581, in _extract_transcoded
    im = self._extract_transcoded_1248bits()
  File "C:\ProgramData\Anaconda3\lib\site-packages\pikepdf\models\image.py", line 528, in _extract_transcoded_1248bits
    im = _transcoding.image_from_buffer_and_palette(
  File "C:\ProgramData\Anaconda3\lib\site-packages\pikepdf\models\_transcoding.py", line 143, in image_from_buffer_and_palette
    im = image_from_byte_buffer(buffer, size, stride)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pikepdf\models\_transcoding.py", line 107, in image_from_byte_buffer
    return Image.frombuffer('L', size, buffer, "raw", 'L', stride, ystep)
  File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\Image.py", line 2932, in frombuffer
    im = im._new(core.map_buffer(data, size, decoder_name, 0, args))
ValueError: buffer is not large enough

when trying to extract pngs from some pdfs. Most pngs are extracted correctly, but some are causing such exception. I tried to debug a bit, but except of "wrong" mode is given to PIL.Image.frombuffer() I was unable to find the issue. By "wrong" I mean always sending 'L' there, when at least in case of that problematic png self.mode == 'P'. I have no idea what it is about, but this is the only thing I was able to notice.

The code I'm using:

import os
from pathlib import Path
from pikepdf import Name, Pdf, PdfImage

files = [f for f in os.listdir('.') if os.path.isfile(f) and str(f).endswith(".pdf")]
for fileName in files:
    pdfFile = Pdf.open(fileName, allow_overwriting_input = True)
    for page in pdfFile.pages:
        for j, (name, rawImage) in enumerate(page.images.items()):
            pdfImage = PdfImage(rawImage)
            tmpFileName = pdfImage.extract_to(fileprefix = "tmp")

 # some unrelated work is done here

    pdfFile.save()
    pdfFile.close()

It crashes on element

1318 0 obj
<< /BitsPerComponent 8 /ColorSpace 636 0 R /Height 302 /Subtype /Image /Width 205 /Length 58912 >>

from attached pdf.
Dyko.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants