Skip to content

How to make quantised models work faster on CPU machines #132

Closed Answered by NirantK
TheRabidWolverine asked this question in Q&A
Discussion options

You must be logged in to vote

Hey @TheRabidWolverine, couple of things:

  1. It's not always necessary that quantization works better for CPU only. One can quantize models for CUDA or Apple runtimes too. It's just that we've chosen to prefer CPU since we had difficulty setting up tests for CUDA and Apple runtimes.
  2. What governs the speed gain — performance? Primarily, the model size. Quantized models do two things: It's less operations and hence the model size is smaller
  3. FastEmbed does use more CPU processes — that's because for larger datasets, we do data parallel processing. That can have a RAM impact as well, since we load more data into RAM for parallel processing.

I hope this answers your questions? Please feel free …

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by NirantK
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #131 on February 23, 2024 05:04.