Read page numbers from outline "action" #310

ojseryd · 2022-02-21T20:09:01Z

I'm trying to figure out how to extract the page number of an OutlineItem when <Action> is returned, since this feature doesn't seem to be implemented yet (?).

Is there a workaround until it's implemented? Any idea of when that might be?

Is it possible to access the "raw" Pdf outline somehow to look for the /Page entry?

Take outlines.pdf as an example:

from pikepdf import Pdf
reader = Pdf.open("outlines.pdf")

with reader.open_outline() as outlines:
    for outline in outlines.root:
        print(outline)

returns:

[+] One -> <Action>
[ ] Two -> <Action>
[+] Three -> <Action>

The text was updated successfully, but these errors were encountered:

mara004 · 2022-02-22T13:26:11Z

Well, each object that you get when iterating over the outline root is an OutlineItem and you may directly access its action dictionary (item.action), which usually has a /D key containing a destination (can be direct or indirect). Assuming it is direct, you'll get an array of a page object, a page location type and between 0 to 4 coordinates. The page index may then be determined using pikepdf.Page(direct_dest[0]).index. (If it's an indirect destination, things would become more complicated.)

I'm trying to figure out how to extract the page number of an OutlineItem when is returned, since this feature doesn't seem to be implemented yet (?).

I believe that libqpdf recently added QPDFOutlineObjectHelper::getDestPage() and some other useful methods releated to parsing the PDF table of contents, but (as far as I can see) pikepdf doesn't have bidings for it yet.
Technically, you could of course implement a bookmark page resolver manually using the means pikepdf currently provides, but depending on your needs this may be rather cumbersome.

(In the meantime, I can also suggest using pymupdf.Document.get_toc() if you don't mind the AGPL3.)

ojseryd · 2022-02-22T23:12:53Z

Thanks for the answer.

In the above example item.action returns NotImplementedError: don't know how to __str__ this object - so it doesn't seem to be possible to access anything in that case. Is that because it's an indirect destination?

I'm currently using PdfFileReader.getDestinationPageNumber() from PyPDF2 to get this info, but since it's not maintained I felt it was time to try and convert to something else.

Will have a look at qpdf and pymupdf and if it may be an option.

jbarlow83 · 2022-02-23T05:53:43Z

Yes, the outline code unfortunately doesn't handle actions at this time, only outline entries explicitly defined with a page destinations. Actions can be a lot of things other than going to a page.

mara004 · 2022-02-23T13:59:29Z

In the above example item.action returns NotImplementedError: don't know how to __str__ this object - so it doesn't seem to be possible to access anything in that case. Is that because it's an indirect destination?

That the object doesn't implement __str__ does not mean it can't be accessed. If you wish to print the action, I think you need to use print(repr(item.action)). That said, it should be possible to work with the action as with any other PDF dictionary. For example, you could do something like this:

if '/D' in item.action:
    dest = item.action.D
    # assuming a direct destination
    assert isinstance(dest, pikepdf.Array)
    page_obj = dest[0]
    page_index = pikepdf.Page(page_obj).index
    print(page_index)

ojseryd · 2022-02-23T18:42:43Z

That said, it should be possible to work with the action as with any other PDF dictionary. For example, you could do something like this:
if '/D' in item.action:
    dest = item.action.D
    # assuming a direct destination
    assert isinstance(dest, pikepdf.Array)
    page_obj = dest[0]
    page_index = pikepdf.Page(page_obj).index
    print(page_index)

Thanks, haven't really understood all of how to work with pikepdf yet but this is atleast one step closer :)

After trying your code snippet I can conclude that its not a direct destination but an indirect one. When printing repr(item.action) I get:

pikepdf.Dictionary({
  "/D": "0",
  "/S": "/GoTo"
})

Is it possible to look up the "/D" value somewhere within pikepdf in this case? Cause I'm guessing it can be resolved to a "/Page" entry somewhere.

mara004 · 2022-02-24T11:22:45Z

Is it possible to look up the "/D" value somewhere within pikepdf in this case? Cause I'm guessing it can be resolved to a "/Page" entry somewhere.

Yes, it should be possible to resolve the indirect/named destination to a direct one. I suppose the document has a name tree at pdf.Root.Names.Dests which can basically be used like a dictionary to map from indirect to direct destinations, thanks to the NameTree support model of pikepdf/qpdf:

named_dest = item.action.D
assert isinstance(named_dest, pikepdf.Dictionary)
name_mapping = pikepdf.NameTree(pdf.Root.Names.Dests)
direct_dest = name_mapping[named_dest]
page = pikepdf.Page(direct_dest[0])
print(page_obj.index)

ojseryd · 2022-02-26T15:02:35Z

Thank you so much for your help! Will test your code when I get a chance, seems to support indirect destinations? If it doesnt work I atleast now know where to look :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read page numbers from outline "action" #310

Read page numbers from outline "action" #310

ojseryd commented Feb 21, 2022

mara004 commented Feb 22, 2022 •

edited

ojseryd commented Feb 22, 2022

jbarlow83 commented Feb 23, 2022

mara004 commented Feb 23, 2022 •

edited

ojseryd commented Feb 23, 2022

mara004 commented Feb 24, 2022

ojseryd commented Feb 26, 2022

Read page numbers from outline "action" #310

Read page numbers from outline "action" #310

Comments

ojseryd commented Feb 21, 2022

mara004 commented Feb 22, 2022 • edited

ojseryd commented Feb 22, 2022

jbarlow83 commented Feb 23, 2022

mara004 commented Feb 23, 2022 • edited

ojseryd commented Feb 23, 2022

mara004 commented Feb 24, 2022

ojseryd commented Feb 26, 2022

mara004 commented Feb 22, 2022 •

edited

mara004 commented Feb 23, 2022 •

edited