Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use 4.7x fewer synchronization barriers in GGML
This commit and the previous one provide a big win for CPUs like EPYC and Threadripper which have lots of cores. I'm now getting 839.57 tok per second on Mistral 7b v0.2 bf16 prefill, which is much better than the 557 tok/sec mentioned in the blog post. Generating speed has been increased ~15-20% on my machine compared to our last release.
- Loading branch information