Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG Inverted characters #164

Open
nicwest opened this issue May 12, 2023 · 2 comments
Open

BUG Inverted characters #164

nicwest opened this issue May 12, 2023 · 2 comments

Comments

@nicwest
Copy link

nicwest commented May 12, 2023

Describe the bug
When adding text to an existing PDF the characters are inverted.

To Reproduce

  1. download this PDF https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1145516/sa800man_2023.pdf
  2. rename to SA800.pdf
  3. put it in the same directory as the following script
  4. run the script
from borb.pdf import Paragraph
from borb.pdf import PDF
from borb.pdf.canvas.geometry.rectangle import Rectangle
from decimal import Decimal


def main():
    with open("SA800.pdf", "rb") as in_file_handle:
        doc = PDF.loads(in_file_handle)

    page = doc.get_page(0)

    r = Rectangle(Decimal(0), Decimal(0), Decimal(200), Decimal(200))
    Paragraph("Hello World!", font="Courier").paint(page, r)

    with open("output.pdf", "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)


if __name__ == "__main__":
    main()

Expected behaviour
I would expect this to render "Hello World!" somewhere near the top of the page

Screenshots

Screenshot 2023-05-12 at 13 19 42

Desktop (please complete the following information):

  • OS: mac
  • borb version 2.1.12
  • input PDF: see steps to reproduce

Additional context
The coordinates of the rectangle aren't behaving as I would expect either, increasing the width seemingly makes the rectangle taller, and visa versa.

@nicwest
Copy link
Author

nicwest commented May 12, 2023

I have a correction. Using the above script renders text correctly. Noteably downloading the PDF from the original source seems to make a difference. I was running the script originally with this PDF which produced the errors:
SA800.pdf

@jorisschellekens
Copy link
Owner

The content of a PDF is located in a so called content stream.

These streams are essentially compressed pieces of code (in a language called postfix) that tell the viewing software how to render content.

In pseudo-code you might find something such as:

  • go to coordinate 80, 120
  • set the font color to black
  • set the font to Helvetica, size 12
  • render the character "H"
  • etc

As you can tell from the pseudo-code, the renderer has a state (coordinates, colors, active font, etc).

There are operators that modify this state even further. For instance you can apply a matrix transformation to the coordinate system.

Normally, you would encapsulate content-rendering operations with a q and Q respectively. These operators tell the viewer to store the graphic state and restore the graphics state.

Your issue might be something like:

  • the PDF already contains a matrix transform
  • the PDF does not restore the state
  • borb appends content, expecting the state to be the default, leading to the wrong output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants