Skip to content

How to extract the text of each category detected on a page? #83

Answered by JaMe76
tikitong asked this question in Q&A
Discussion options

You must be logged in to vote

Hi, thanks for your question.

I have to admit that the consumer API is very confusing and I am experimenting with a new one that will be hopefully easier to understand. The difficulty is to establish an API that can be used even if one has a model that determines different categories than the ones currently in use.

For now every, layout block other than 'TABLE' is stored in Page.items. Saying that you can get name, reading order position and text of the layout block as follows:

import deepdoctection as dd

   path = "/path/to/dir"
   analyzer = dd.get_dd_analyzer()

   df = analyzer.analyze(path=path)
   df.reset_state()

   for dp in df:
       for item in dp.items:
           print(f"re…

Replies: 2 comments 5 replies

Comment options

You must be logged in to vote
4 replies
@tikitong
Comment options

@JaMe76
Comment options

@JaMe76
Comment options

@tikitong
Comment options

Answer selected by tikitong
Comment options

You must be logged in to vote
1 reply
@JaMe76
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants