Skip to content
This repository has been archived by the owner on Aug 19, 2023. It is now read-only.

Why fp16 MPS performance is worse than CPU? #15

Open
FdyCN opened this issue Apr 23, 2023 · 2 comments
Open

Why fp16 MPS performance is worse than CPU? #15

FdyCN opened this issue Apr 23, 2023 · 2 comments

Comments

@FdyCN
Copy link

FdyCN commented Apr 23, 2023

In your conclusion. MPS performace is worse than llama.cpp cpu performance in the same fp16. Why? Is there any kernel which MPS doesn't support will fallback to CPU( so that hurt performace)?

you said this:
image

i figure that you mean MPS shader is compiling Just-in-time so the performace is worse than A-head-of-time compiled CPU codes?Am i wrong?

@jankais3r
Copy link
Owner

Hi, I wish I could give you a definitive answer, but unfortunately I am not familiar enough with PyTorch' MPS implementation to be able to confirm or deny your theory...

@jankais3r jankais3r reopened this May 10, 2023
@philipturner
Copy link

It's bandwidth. The model is bottlenecked by how quickly the processor can fetch weights from RAM. FP16 consumes 4x as many bits as Int4, and thus is 4x slower.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants