You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've seen the pretty amazing results others have been posting using llamafile even for large models... My system is obviously not even half of what Justine's is, so I'm not expecting to get 80 t/s with trying to run LLama3 70B...
With linux with i9-13900K w/ 128GB RAM and Radeon RX 6900 XT, I have seen performance of between 0.75 and 0.85 tokens per/second (depending on if I set -t > nproc/2) Llamafile0.73/LLama3 70B. I'm wondering if this is my ceiling or if there's something else I can tweak to get even a handful of tokens/s from this set-up.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've seen the pretty amazing results others have been posting using llamafile even for large models... My system is obviously not even half of what Justine's is, so I'm not expecting to get 80 t/s with trying to run LLama3 70B...
With linux with
i9-13900K
w/128GB RAM
andRadeon RX 6900 XT
, I have seen performance of between 0.75 and 0.85 tokens per/second (depending on if I set-t
> nproc/2) Llamafile0.73/LLama3 70B. I'm wondering if this is my ceiling or if there's something else I can tweak to get even a handful of tokens/s from this set-up.This is the command I'm running with:
nix-shell -p podman fuse-overlayfs --run "podman run --rm -ti --device=/dev/kfd --device=/dev/dri -e DISPLAY=${DISPLAY} -v /tmp/.X11-unix/X0:/tmp/.X11-unix/X0 -v /home:/home -p "8080:8080" docker.io/rocm/pytorch bash ~/Downloads/Meta-Llama-3-70B-Instruct.Q8_0.llamafile -ngl 14 --host "0.0.0.0""
What kind of performance half others been seeing with mid hardware like mine? Anything I should be doing that might help improve throughput?
Beta Was this translation helpful? Give feedback.
All reactions