Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding splitting PDF documents in specified chunks. #212

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

frivas-at-navteca
Copy link
Contributor

This PR is aimed to add to the split method the capability of splitting in chunks. The chunk size is specified when calling the method. It will split the PDF in N amount of chunks of the specified amount of pages, it includes both boundaries. It also takes into account that probably chunks will not be of the same size due to exceeding amount of pages so the last PDF will have the remaining amount of pages.

I have also added tests.

The reason of this PR is because I have been working with Deepdoctection and saw it has a split method however it splits the PDF in single pages which is very useful however I needed the PDF split in chunks and I could not find a library that does that so I thought that perhaps this might be useful to someone else and perhaps could be a good PR.

I am sure the code can be improved as well, this can be a good a first approach at least.

Also I have noticed there was a function (_reset_df_and_get_length) that was defined as internal and was not being used so I removed it.

I hope this is good! :D Thanks again for creating this amazing library.

@JaMe76
Copy link
Contributor

JaMe76 commented Aug 21, 2023

Thanks for the PR and sorry that it will take some time for a review. Will get back to you asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants