Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packed Sequence Vision Transformer (aka NaViT) #1952

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

rwightman
Copy link
Collaborator

A big WIP, pushing early to resolve masking stability issues with F.sdpa

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@Adenialzz
Copy link

is this still in progress ?

@b5y
Copy link

b5y commented Feb 26, 2024

Hello @rwightman !

Any updates regarding this PR? Is there anything I can help with?

@rwightman
Copy link
Collaborator Author

rwightman commented Mar 21, 2024

@Adenialzz @b5y sorry for delay, been trying to get some other things out the door.

So, where I got with this, I verified the modelling aspect works. The masking / handling of the packed patches seems fine. If you've looked I implemented a very rudimentary packing that is currently just injected in the forward of the model (it takes standard uniform batches of images, splits and then repacks). This is obviously not the point, but was a quick hack to allow me to test.

For this to work efficiently the packing needs to be integreated into the datapipeline with extra buffering and a better thought out packing algorithm (essentially online bin packing). The data augmentations need to be tuned wrt to the dataset image size range such that you end up with a distribution of image sizes and patch lengths that's optimally packable.

I hope to get back to this. The feature is definitely more data pipeline & packing working than modelling...

Right now I'm working on a data loading library oriented towards large document (pdf) and image + text datasets and associated augmentations/preprocessing. I was thinking of moving the packing/pipeline code there once I get the initial version of that public & released...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants