Skip to content

Commit

Permalink
Use 4.7x fewer synchronization barriers in GGML
Browse files Browse the repository at this point in the history
This commit and the previous one provide a big win for CPUs like EPYC
and Threadripper which have lots of cores. I'm now getting 839.57 tok
per second on Mistral 7b v0.2 bf16 prefill, which is much better than
the 557 tok/sec mentioned in the blog post. Generating speed has been
increased ~15-20% on my machine compared to our last release.
  • Loading branch information
jart committed Apr 29, 2024
1 parent 6162004 commit bd8c0de
Show file tree
Hide file tree
Showing 7 changed files with 215 additions and 191 deletions.

0 comments on commit bd8c0de

Please sign in to comment.