Realistic Performance Expectations: i9-13900K / 128GB / RX 6900 XT - usable 70B? #367

deftdawg · 2024-04-25T22:37:24Z

deftdawg
Apr 25, 2024

I've seen the pretty amazing results others have been posting using llamafile even for large models... My system is obviously not even half of what Justine's is, so I'm not expecting to get 80 t/s with trying to run LLama3 70B...

With linux with i9-13900K w/ 128GB RAM and Radeon RX 6900 XT, I have seen performance of between 0.75 and 0.85 tokens per/second (depending on if I set -t > nproc/2) Llamafile0.73/LLama3 70B. I'm wondering if this is my ceiling or if there's something else I can tweak to get even a handful of tokens/s from this set-up.

This is the command I'm running with:

nix-shell -p podman fuse-overlayfs --run "podman run --rm -ti --device=/dev/kfd --device=/dev/dri -e DISPLAY=${DISPLAY} -v /tmp/.X11-unix/X0:/tmp/.X11-unix/X0 -v /home:/home -p "8080:8080" docker.io/rocm/pytorch bash ~/Downloads/Meta-Llama-3-70B-Instruct.Q8_0.llamafile -ngl 14 --host "0.0.0.0""

What kind of performance half others been seeing with mid hardware like mine? Anything I should be doing that might help improve throughput?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Realistic Performance Expectations: i9-13900K / 128GB / RX 6900 XT - usable 70B? #367

{{title}}

Replies: 0 comments

Select a reply

Realistic Performance Expectations: i9-13900K / 128GB / RX 6900 XT - usable 70B? #367

deftdawg Apr 25, 2024

Replies: 0 comments

deftdawg
Apr 25, 2024