Packed Sequence Vision Transformer (aka NaViT) #1952

rwightman · 2023-09-13T22:49:03Z

A big WIP, pushing early to resolve masking stability issues with F.sdpa

HuggingFaceDocBuilderDev · 2023-09-13T22:54:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Adenialzz · 2024-02-20T08:11:41Z

is this still in progress ?

b5y · 2024-02-26T15:25:15Z

Hello @rwightman !

Any updates regarding this PR? Is there anything I can help with?

rwightman · 2024-03-21T20:27:20Z

@Adenialzz @b5y sorry for delay, been trying to get some other things out the door.

So, where I got with this, I verified the modelling aspect works. The masking / handling of the packed patches seems fine. If you've looked I implemented a very rudimentary packing that is currently just injected in the forward of the model (it takes standard uniform batches of images, splits and then repacks). This is obviously not the point, but was a quick hack to allow me to test.

For this to work efficiently the packing needs to be integreated into the datapipeline with extra buffering and a better thought out packing algorithm (essentially online bin packing). The data augmentations need to be tuned wrt to the dataset image size range such that you end up with a distribution of image sizes and patch lengths that's optimally packable.

I hope to get back to this. The feature is definitely more data pipeline & packing working than modelling...

Right now I'm working on a data loading library oriented towards large document (pdf) and image + text datasets and associated augmentations/preprocessing. I was thinking of moving the packing/pipeline code there once I get the initial version of that public & released...

rwightman added 2 commits September 13, 2023 15:46

Initial impl of WIP packed vit (navit)

6461405

Remove patch dropout layer as it should be integrated into packing

d81f75b

rwightman added 3 commits September 14, 2023 10:12

Remove padding calc from pack, minor fixes

f93083e

Remove key_padding masking, sequence isolation is enough.

2734bb7

Remove sdpa context mgrs

379780b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packed Sequence Vision Transformer (aka NaViT) #1952

Packed Sequence Vision Transformer (aka NaViT) #1952

rwightman commented Sep 13, 2023

HuggingFaceDocBuilderDev commented Sep 13, 2023

Adenialzz commented Feb 20, 2024

b5y commented Feb 26, 2024

rwightman commented Mar 21, 2024 •

edited

Packed Sequence Vision Transformer (aka NaViT) #1952

Are you sure you want to change the base?

Packed Sequence Vision Transformer (aka NaViT) #1952

Conversation

rwightman commented Sep 13, 2023

HuggingFaceDocBuilderDev commented Sep 13, 2023

Adenialzz commented Feb 20, 2024

b5y commented Feb 26, 2024

rwightman commented Mar 21, 2024 • edited

rwightman commented Mar 21, 2024 •

edited