Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Links are not working on merged PDF #1

Open
aarkue opened this issue Jun 2, 2023 · 4 comments
Open

Internal Links are not working on merged PDF #1

aarkue opened this issue Jun 2, 2023 · 4 comments

Comments

@aarkue
Copy link
Owner

aarkue commented Jun 2, 2023

When merging multiple PDFs that contain references or links to different PDF locations, those no longer work in the merged PDF.

We already explore and collect all named destinations. We might somehow need to re-add them back to the merged PDF file.
Moreover, we will need to investigate further how references/links are implemented in PDFs.

@stefnotch
Copy link

Could this be related to Hopding/pdf-lib#341 ? I did notice that you're using a better maintained fork of pdf-lib

@aarkue
Copy link
Owner Author

aarkue commented Jan 21, 2024

Yeah, the issue you linked is the relevant upstream issue 👍. A simple API that allows manipulating Named Destinations in PDFs will likely not come soon to https://github.com/Hopding/pdf-lib or https://github.com/cantoo-scribe/pdf-lib.

I think the best way forward would be to implement this functionality ourselves, based on the lower-level APIs of pdf-lib. Initially, I started some work on this but did not get very far.
There are some functions related to the PDF Outlines Entries (like getNamedDestinations) already implemented, which can probably be used for this as well.

@stefnotch
Copy link

Sounds good then. I'm guessing that that is the only major pdf-lib issue?

From what I remember, there are other issues like Hopding/pdf-lib#169 and Hopding/pdf-lib#140 which do have workarounds.

If I want to help with implementing the named destinations API ourselves, where would I get started? I sadly know very little about the PDF file format.

@aarkue
Copy link
Owner Author

aarkue commented Jan 25, 2024

Great, yes, I think otherwise we can work pretty well with what pdf-lib provides!

But good that you mention the deletePage issue, I think the implementation in https://github.com/aarkue/pdf-tools-web/blob/main/components/ReorderPages.tsx could be improved to actually remove content correctly from the PDF file.

Regarding this issue, I think the best way to get started would be creating/gathering some sample PDF documents with internal links (Probably easy to do with a simple LaTeX PDF with a \ref or \cite ). Next, we would try to find where pdf-lib exposes this link information (likely named destinations + something else). After that, it would be great to try to create a PDF file using pdf-lib which contains internal links.

So basically:

  1. Gather sample PDF files with internal links
  2. Extract link information using pdf-lib
  3. Create a PDF file containing internal links using pdf-lib

Finally, then combining all of this to actually preserve internal links in merged/reordered PDFs should not be too hard, hopefully. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants