Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime Optimization #86

Open
aditya-y47 opened this issue Sep 25, 2023 · 1 comment
Open

Runtime Optimization #86

aditya-y47 opened this issue Sep 25, 2023 · 1 comment

Comments

@aditya-y47
Copy link

Hey, first up, thank you for building and open sourcing such a great piece of work, I have been using INSTRUCTOR for some time now and I absolutely love it.

I'm planning on working generating embeddings for a large corpus of texts (In Million scale), I intend to schedule the embedding generation job as an aysnc-MQ based execution. Based on some of my initial estimates the run-time estimates are a bit on the higher side, I was hoping certain methods could be used to optimize the generation of embeddings. Some of them include.

  1. Inference on TensorRT
  2. Compile the underlying PyTorch model
    • I see that you folks use Sentence-transformers like implementation, so I am unsure if torch compile how it would work
  3. Using Kernel fusion / Custom kernels. etc

Are there any generally prescribed guidelines which would help me achieve these, is anyone here working on such optimizations?

@hongjin-su
Copy link
Collaborator

Yeah, INSTRUCTOR is highly similar to sentence-transformer in terms of the model architecture. Therefore, any optimization that applies to sentence-transformer models may also be applicable to the INSTRUCTOR models.

Recently, there have been some efforts in model quantization, which you may take as references:
https://www.sbert.net/examples/training/distillation/README.html#quantization
https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/distillation/model_quantization.py

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants