Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clip-ViT-B-32-multilingual-v1 support, ps: I can contribute. #70

Open
yaman opened this issue Nov 23, 2023 · 5 comments
Open

clip-ViT-B-32-multilingual-v1 support, ps: I can contribute. #70

yaman opened this issue Nov 23, 2023 · 5 comments
Assignees
Labels
model request request for supporting new models

Comments

@yaman
Copy link

yaman commented Nov 23, 2023

I exported clip-ViT-B-32-multilingual-v1 to onnx with some modifications(no effect on the output embedding).

hf optimum onnx export can export this model with (0) Transformer and (1) Pooling. But it can not extend with provided dense layer. What I have done is, I created a model that combines 3 layers as follows;

CombinedModel

from sentence_transformers import SentenceTransformer
from sentence_transformers import models
import torch
import torch.nn as nn
import onnx
import numpy as np

class CombinedModel(nn.Module):
    def __init__(self, transformer_model, dense_model):
        super(CombinedModel, self).__init__()
        self.transformer = transformer_model
        self.dense = dense_model

    def forward(self, input_ids, attention_mask):
        outputs = self.transformer({'input_ids': input_ids, 'attention_mask': attention_mask})
        token_embeddings = outputs['token_embeddings']
        dense_output = self.dense({'sentence_embedding': token_embeddings})
        dense_output_tensor = dense_output['sentence_embedding']
        
        ### this was important for me. it took me a bit to figure out that original model takes the mean of dense output
        mean_output = torch.mean(dense_output_tensor, dim=1)
        flattened_output = mean_output.squeeze(0)
        return flattened_output

Combine dense with original model

transformer_model = SentenceTransformer('clip-ViT-B-32-multilingual-v1', cache_folder='model_pytorch')
tokenizer = transformer_model.tokenizer

### this is from dense model configuration
dense_model = models.Dense(
    in_features=768,
    out_features=512,
    bias=False,
    activation_function= nn.Identity()
)

### load the weights from dense model binary
state_dict = torch.load('model_pytorch/sentence-transformers_clip-ViT-B-32-multilingual-v1/2_Dense/pytorch_model.bin')
dense_model.load_state_dict(state_dict)

model = CombinedModel(transformer_model, dense_model)

Export combined model to onnx

model.eval()

input_text = "This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text (in 50+ languages) and images to a common dense vector space such that images and the matching texts are close."

inputs = tokenizer(input_text, padding='longest', truncation=True, max_length=128, return_tensors='pt')
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']

# Export the model
torch.onnx.export(model,               # model being run
                  (input_ids, attention_mask), # model input (or a tuple for multiple inputs)
                  "combined_model.onnx", # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=17,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input_ids', 'attention_mask'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input_ids': {0 : 'batch_size', 1: 'seq_length'},    # variable length axes
                                'attention_mask': {0 : 'batch_size', 1: 'seq_length'},
                                'output' : {0 : 'batch_size'}})

onnx.checker.check_model("combined_model.onnx")
comdined_model = onnx.load("combined_model.onnx")

Compare both original and onnx model output;

import torch
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

model = SentenceTransformer('sentence-transformers/clip-ViT-B-32-multilingual-v1')
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/clip-ViT-B-32-multilingual-v1')

# Prepare the input
text = "This is an example sentence."
inputs = tokenizer(text, padding='longest', truncation=True, max_length=128, return_tensors='pt')

# Run the PyTorch model
pytorch_output =  model.encode(text, convert_to_tensor=True, device='cpu')

# Convert the inputs to numpy arrays for the ONNX model
inputs_onnx = {name: tensor.numpy() for name, tensor in inputs.items()}

# Run the ONNX model
sess = ort.InferenceSession("combined_model.onnx")
onnx_output = sess.run(None, inputs_onnx)

# Compare the outputs
print("Are the outputs close?", np.allclose(pytorch_output.detach().numpy(), onnx_output[0], atol=1e-6))

# Calculate the differences between the outputs
differences = pytorch_output.detach().numpy() - onnx_output[0]

# Print the standard deviation of the differences
print("Standard deviation of the differences:", np.std(differences))

print("pytorch_output size:", pytorch_output.size())
print("onnx_output size:", onnx_output[0].shape)

Output:

Are the outputs close? True
Standard deviation of the differences: 1.6167593e-07
pytorch_output size: torch.Size([512])
onnx_output size: (512,)

I would really like to contribute the onnx model, novices like me can use the onnx version easily. I did not find any CONTRIBUTIONS guide, however, I can contribute the model with your directions.

@yaman
Copy link
Author

yaman commented Dec 5, 2023

Anyone in the void?

@NirantK
Copy link
Collaborator

NirantK commented Dec 13, 2023

Hey @yaman ! Sorry, I was away from the project.

Would love to have this! This is quite a neat workaround!

Can you push the ONNX Model weights to Huggingface Hub and raise a PR with that? That way, you always retain the attribution for doing the ONNX export.

I can help you get started with both. Here is a calendar link if that's easier?
https://cal.com/nirant-kasliwal-qdrant/30min

@yaman
Copy link
Author

yaman commented Dec 19, 2023

Hi @NirantK,

Sorry for late reply, I caught flu and it knocked me out.

Let me give latest updates;

After following the issue with friends from hf-optimum team, my workaround is not necessary anymore, team fixed the problem with huggingface/optimum#1519 on their main branch(though might not be released yet).

I have already created hf repo(https://huggingface.co/canavar/clip-ViT-B-32-multilingual-v1-ONNX) but I was waiting a response from model owners to push to original model repository(if possible) but no luck. I will upload the onnx version of the model to my hf repo and let you know.

thanks

@yaman
Copy link
Author

yaman commented Dec 19, 2023

Hi @NirantK again,

I pushed the model to https://huggingface.co/canavar/clip-ViT-B-32-multilingual-v1-ONNX. Do you want me to raise a pr to fastembed repo?

@NirantK
Copy link
Collaborator

NirantK commented Dec 26, 2023

I'd love it if you can PR it! That'll go much faster!

@generall generall added the model request request for supporting new models label Jan 5, 2024
@joein joein self-assigned this Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model request request for supporting new models
Projects
None yet
Development

No branches or pull requests

4 participants