Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent behavior of get() methods of PDFOutputDevices #10

Open
ashutoshvarma opened this issue Aug 4, 2020 · 0 comments
Open

inconsistent behavior of get() methods of PDFOutputDevices #10

ashutoshvarma opened this issue Aug 4, 2020 · 0 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@ashutoshvarma
Copy link
Owner

When calling get() with index out of page range RawImageOutput returns last page's image whereas TextOutput throws a IndexError.

Steps To Reproduce:-

d = x.Document("samples/simple1.pdf")
iout = x.RawImageOutput(d)
tout = x.TextOutput(d)

print(len(d))
print(iout.get(10))            # will return same as iout.get(0)
print(tout.get(10))            # will throw Index Error

Output:-

1
<PIL.Image.Image image mode=RGB size=1275x1651 at 0x7F179F573370>
Traceback (most recent call last):
  File "_test.py", line 15, in <module>
    tout.get(10)
  File "src/pyxpdf/textoutput.pxi", line 268, in pyxpdf.xpdf.TextOutput.get
    cpdef object get(self, int page_no):
  File "src/pyxpdf/textoutput.pxi", line 286, in pyxpdf.xpdf.TextOutput.get
    return self._get_bytes(page_no).decode('UTF-8', errors='ignore')
  File "src/pyxpdf/textoutput.pxi", line 209, in pyxpdf.xpdf.TextOutput._get_bytes
    if self._cache_texts[page_no] == None:
IndexError: list index out of range
@ashutoshvarma ashutoshvarma added bug Something isn't working enhancement New feature or request labels Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant