clip-ViT-B-32-multilingual-v1 support, ps: I can contribute. #70

yaman · 2023-11-23T13:40:52Z

I exported clip-ViT-B-32-multilingual-v1 to onnx with some modifications(no effect on the output embedding).

hf optimum onnx export can export this model with (0) Transformer and (1) Pooling. But it can not extend with provided dense layer. What I have done is, I created a model that combines 3 layers as follows;

CombinedModel

from sentence_transformers import SentenceTransformer
from sentence_transformers import models
import torch
import torch.nn as nn
import onnx
import numpy as np

class CombinedModel(nn.Module):
    def __init__(self, transformer_model, dense_model):
        super(CombinedModel, self).__init__()
        self.transformer = transformer_model
        self.dense = dense_model

    def forward(self, input_ids, attention_mask):
        outputs = self.transformer({'input_ids': input_ids, 'attention_mask': attention_mask})
        token_embeddings = outputs['token_embeddings']
        dense_output = self.dense({'sentence_embedding': token_embeddings})
        dense_output_tensor = dense_output['sentence_embedding']
        
        ### this was important for me. it took me a bit to figure out that original model takes the mean of dense output
        mean_output = torch.mean(dense_output_tensor, dim=1)
        flattened_output = mean_output.squeeze(0)
        return flattened_output

Combine dense with original model

transformer_model = SentenceTransformer('clip-ViT-B-32-multilingual-v1', cache_folder='model_pytorch')
tokenizer = transformer_model.tokenizer

### this is from dense model configuration
dense_model = models.Dense(
    in_features=768,
    out_features=512,
    bias=False,
    activation_function= nn.Identity()
)

### load the weights from dense model binary
state_dict = torch.load('model_pytorch/sentence-transformers_clip-ViT-B-32-multilingual-v1/2_Dense/pytorch_model.bin')
dense_model.load_state_dict(state_dict)

model = CombinedModel(transformer_model, dense_model)

Export combined model to onnx

model.eval()

input_text = "This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text (in 50+ languages) and images to a common dense vector space such that images and the matching texts are close."

inputs = tokenizer(input_text, padding='longest', truncation=True, max_length=128, return_tensors='pt')
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']

# Export the model
torch.onnx.export(model,               # model being run
                  (input_ids, attention_mask), # model input (or a tuple for multiple inputs)
                  "combined_model.onnx", # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=17,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input_ids', 'attention_mask'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input_ids': {0 : 'batch_size', 1: 'seq_length'},    # variable length axes
                                'attention_mask': {0 : 'batch_size', 1: 'seq_length'},
                                'output' : {0 : 'batch_size'}})

onnx.checker.check_model("combined_model.onnx")
comdined_model = onnx.load("combined_model.onnx")

Compare both original and onnx model output;

import torch
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

model = SentenceTransformer('sentence-transformers/clip-ViT-B-32-multilingual-v1')
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/clip-ViT-B-32-multilingual-v1')

# Prepare the input
text = "This is an example sentence."
inputs = tokenizer(text, padding='longest', truncation=True, max_length=128, return_tensors='pt')

# Run the PyTorch model
pytorch_output =  model.encode(text, convert_to_tensor=True, device='cpu')

# Convert the inputs to numpy arrays for the ONNX model
inputs_onnx = {name: tensor.numpy() for name, tensor in inputs.items()}

# Run the ONNX model
sess = ort.InferenceSession("combined_model.onnx")
onnx_output = sess.run(None, inputs_onnx)

# Compare the outputs
print("Are the outputs close?", np.allclose(pytorch_output.detach().numpy(), onnx_output[0], atol=1e-6))

# Calculate the differences between the outputs
differences = pytorch_output.detach().numpy() - onnx_output[0]

# Print the standard deviation of the differences
print("Standard deviation of the differences:", np.std(differences))

print("pytorch_output size:", pytorch_output.size())
print("onnx_output size:", onnx_output[0].shape)

Output:

Are the outputs close? True
Standard deviation of the differences: 1.6167593e-07
pytorch_output size: torch.Size([512])
onnx_output size: (512,)

I would really like to contribute the onnx model, novices like me can use the onnx version easily. I did not find any CONTRIBUTIONS guide, however, I can contribute the model with your directions.

The text was updated successfully, but these errors were encountered:

yaman · 2023-12-05T13:23:38Z

Anyone in the void?

NirantK · 2023-12-13T17:13:02Z

Hey @yaman ! Sorry, I was away from the project.

Would love to have this! This is quite a neat workaround!

Can you push the ONNX Model weights to Huggingface Hub and raise a PR with that? That way, you always retain the attribution for doing the ONNX export.

I can help you get started with both. Here is a calendar link if that's easier?
https://cal.com/nirant-kasliwal-qdrant/30min

yaman · 2023-12-19T12:08:40Z

Hi @NirantK,

Sorry for late reply, I caught flu and it knocked me out.

Let me give latest updates;

After following the issue with friends from hf-optimum team, my workaround is not necessary anymore, team fixed the problem with huggingface/optimum#1519 on their main branch(though might not be released yet).

I have already created hf repo(https://huggingface.co/canavar/clip-ViT-B-32-multilingual-v1-ONNX) but I was waiting a response from model owners to push to original model repository(if possible) but no luck. I will upload the onnx version of the model to my hf repo and let you know.

thanks

yaman · 2023-12-19T18:37:51Z

Hi @NirantK again,

I pushed the model to https://huggingface.co/canavar/clip-ViT-B-32-multilingual-v1-ONNX. Do you want me to raise a pr to fastembed repo?

NirantK · 2023-12-26T11:11:00Z

I'd love it if you can PR it! That'll go much faster!

yaman mentioned this issue Nov 23, 2023

sentence-transformers/clip-ViT-B-32-multilingual-v1 model Anush008/fastembed-rs#7

Closed

generall added the model request request for supporting new models label Jan 5, 2024

NirantK mentioned this issue Jan 17, 2024

Add support for Image/Multimodal Model #92

Closed

joein self-assigned this Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip-ViT-B-32-multilingual-v1 support, ps: I can contribute. #70

clip-ViT-B-32-multilingual-v1 support, ps: I can contribute. #70

yaman commented Nov 23, 2023

yaman commented Dec 5, 2023

NirantK commented Dec 13, 2023

yaman commented Dec 19, 2023 •

edited

yaman commented Dec 19, 2023

NirantK commented Dec 26, 2023

clip-ViT-B-32-multilingual-v1 support, ps: I can contribute. #70

clip-ViT-B-32-multilingual-v1 support, ps: I can contribute. #70

Comments

yaman commented Nov 23, 2023

CombinedModel

Combine dense with original model

Export combined model to onnx

Compare both original and onnx model output;

Output:

yaman commented Dec 5, 2023

NirantK commented Dec 13, 2023

yaman commented Dec 19, 2023 • edited

yaman commented Dec 19, 2023

NirantK commented Dec 26, 2023

yaman commented Dec 19, 2023 •

edited