Adding imagebind #30690

EduardoPach · 2024-05-07T09:35:05Z

What does this PR do?

This PR fixes #23240 by adding ImageBind model.

This is based on #26310 which is currently stale and the author said it would not have time to work on it (though welcome to help @dg845 ).

Taking into consideration the points raised by @dg845 here #26310 (comment) I'll focus on adding the text/image/audio portion and try to contact the authors.

Who can Review

@amyeroberts (?)

…MU) and update config classes for text and image modalities.

…al, IMU).

…, thermal).

…ality.

…h, thermal, imu).

…ImageBind follows Audio Spectrogram Transformer audio processing).

…uding audio (depth, thermal).

…s/image processors to ImageBind's __init__.py file.

… processing.

…clipped images) following VideoMAE.

EduardoPach · 2024-05-23T16:18:59Z

@amyeroberts I think we're good for a first review!

There are a few points that we should have in mind:

I have converted only text, vision, and audio modalities so far as those are the only modalities that have their preprocess steps in the original repo. There are a few issues in the original repo from people mentioning how they've managed the preprocessing of depth and thermal so if we want to we can add these as well.
I'm not sure how we should deal with videos as they fall in the vision modality and thus should be passed as pixel_values as well. Preprocessing is equal to the image version + a frame selection logic. Should we have a videos arg in the ImageBindImageProcessor (currently this processor is just a copy from CLIP)
In the case we add thermal and depth as well following the issues in the original repo should these be passed in the ImageBindImageProcessor as well?
When pushing ImageBindProcessor to the hub I've noticed that the args for ImageBindImageProcessor and ImageBindFeatureExtractor are within the same file preprocessor_config.json which is not ideal

dg845 added 30 commits September 20, 2023 23:52

initial commit for ImageBind model

d72c9a3

add initial testing code for ImageBind model

6be5464

Add config classes for remaining modalities (audio, depth, thermal, I…

190e727

…MU) and update config classes for text and image modalities.

Update ImageBindOutput with remaining modalities (audio, depth, therm…

3692190

…al, IMU).

Add embedding classes for image-like modalities (vision, audio, depth…

4037f6a

…, thermal).

Implement IMU embedding class.

970dc5d

Add module to convert still images into video frames.

ffd1460

Add implementation for shared model encoder blocks.

ee74943

Add key and value biases to ImageBindAttention.

93ce319

Add ImageBind heads and postprocessors.

c7968d6

Update ImageBindModel.forward to compare images against any other mod…

0000bbc

…ality.

Separate normalized embeddings into their own output field.

a1bdbf7

Add initial tester/test classes for remaining modalities (audio, dept…

69fa517

…h, thermal, imu).

Create initial audio feature extractor based on ASTFeatureExtractor (…

a8341e4

…ImageBind follows Audio Spectrogram Transformer audio processing).

Add image processing classes for remaining image-like modalities excl…

ac926ad

…uding audio (depth, thermal).

Add IMU feature extractor class declaration and add feature extractor…

e151140

…s/image processors to ImageBind's __init__.py file.

Update ImageBindAudioFeatureExtractor to use ImageBind-specific audio…

789559a

… processing.

Add final dropout layer to ImageBindImuTransformer.

84851a5

Fix typo

43016df

Change model test parameters to be closer to ImageBind defaults.

93d7749

Update audio feature extractor to output batched and clipped audio.

1b4bb43

Add modeling support for batched and clipped vision and audio inputs.

d9a0a80

Update ImageBind image processor to always output video (batched and …

b5d46cd

…clipped images) following VideoMAE.

Merge branch 'main' into imagebind-model

029d424

Implement ImageBindDepthImageProcessor.

a9d432c

Implement ImageBindImuFeatureExtractor.

90543ce

Fix some modeling code bugs.

8ce499b

Move Image2Video logic into RGBDTPatchEmbedding.

484cd3f

Fix attention kv bias initialization bug.

284ffe5

Implement ImageBind conversion script.

c5d1e3b

EduardoPach added 27 commits May 13, 2024 18:02

Improving import and cos

78ccd1f

Fix copies

6e8407d

ImageBindFeatureExtractor

a83bebe

Merge remote-tracking branch 'upstream/main' into adding-imagebind

0ee0902

fix copies

7421c63

Improving tests

8af30b1

More improvements

99770c5

Fixing tests

8a59421

Tests green

3d3a273

Improving consistency

8fcf36c

Merge remote-tracking branch 'upstream/main' into adding-imagebind

cfe9da6

Removed speech dependency

de7f84d

Merge remote-tracking branch 'upstream/main' into adding-imagebind

1c9b317

Updated conversion script

003ff10

Merge remote-tracking branch 'upstream/main' into adding-imagebind

a0ef219

Improved ImageBindProcessor

df4c0e4

ImageBindProcessor working

8d055f1

Merge remote-tracking branch 'upstream/main' into adding-imagebind

05ac8ba

Update docs and docstrings

c8ad793

Merge remote-tracking branch 'upstream/main' into adding-imagebind

5b39d85

ImageBindFeatureExtractor tests

97c4bd5

Merge remote-tracking branch 'upstream/main' into adding-imagebind

d9c6c84

ImageBindProcessor tests

987f404

Make tests green

709613c

Improve feature extractor

9fdcce4

fix style and copies

d0f788a

fix style new

4d2dd20

EduardoPach changed the title ~~[WIP] Adding imagebind~~ Adding imagebind May 23, 2024

nits

2f2b511

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding imagebind #30690

Adding imagebind #30690

EduardoPach commented May 7, 2024 •

edited

EduardoPach commented May 23, 2024

Adding imagebind #30690

Are you sure you want to change the base?

Adding imagebind #30690

Conversation

EduardoPach commented May 7, 2024 • edited

What does this PR do?

Who can Review

EduardoPach commented May 23, 2024

EduardoPach commented May 7, 2024 •

edited