Will using only CPU be faster than llama.cpp? #140

liutt1312 · 2024-02-02T09:50:31Z

Will using only CPU be faster than llama.cpp?

hodlen · 2024-02-05T15:41:09Z

Compared with llama.cpp in CPU-only mode, yes. PowerInfer can reduce ~50% FLOPS end to end, depending on the model architecture and sparsity. So a ~2x speedup with CPU decoding would be expected.

liutt1312 added the question Further information is requested label Feb 2, 2024

woheller69 mentioned this issue Mar 28, 2024

[feature request] Add support for PowerInfer nomic-ai/gpt4all#1778

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will using only CPU be faster than llama.cpp? #140

Will using only CPU be faster than llama.cpp? #140

liutt1312 commented Feb 2, 2024 •

edited

hodlen commented Feb 5, 2024

Will using only CPU be faster than llama.cpp? #140

Will using only CPU be faster than llama.cpp? #140

Comments

liutt1312 commented Feb 2, 2024 • edited

hodlen commented Feb 5, 2024

liutt1312 commented Feb 2, 2024 •

edited