Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add char/page limit #75

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

add char/page limit #75

wants to merge 2 commits into from

Conversation

xiaoxin01
Copy link
Contributor

Large file will cause memory leak, so add char/page limit function.

@xiaoxin01
Copy link
Contributor Author

@jonathaningram Could you please help to review this PR?

Because some pdf or docx files have many pages and take a lot of time to parse text, I think a limitation is needed.

@jonathaningram
Copy link
Member

@xiaoxin01 thank you for the PR. We'll take a look over it and get back to you as soon as possible. This PR changes the API surface of the package a fair bit so we'll want to look into those changes and we'll discuss here.

t.Fatalf("got error = %v, want nil", err)
}

resp, _, err := ConvertPDF(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxin01 I think the best approach for supporting this kind configuration is to introduce new Converters. Something like:

package docconv

type PDFConverter struct {
  pageRange []int
  maxWords int
}

func NewPDFConverter(pageRange []int, maxWords int) *PDFConverter {
  return &PDFConverter{
    pageRange: pageRange,
    maxWords: maxWords,
  }
}

func (c *PDFConverter) Convert(r io.Reader) (string, map[string]string, error) {
  // ...
}

We want to avoid anymore global package state so I think a solution like this is probably going to be better. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants