Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On m1 pro - "Distributed package doesn't have NCCL built in #37

Open
clearsitedesigns opened this issue Apr 18, 2024 · 9 comments · May be fixed by #44
Open

On m1 pro - "Distributed package doesn't have NCCL built in #37

clearsitedesigns opened this issue Apr 18, 2024 · 9 comments · May be fixed by #44

Comments

@clearsitedesigns
Copy link

Must be something torch package, related...

This is when trying to run the command

torchrun --nproc_per_node 1 example_chat_completion.py
--ckpt_dir Meta-Llama-3-8B-Instruct/
--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model
--max_seq_len 512 --max_batch_size 6

@kalun85
Copy link

kalun85 commented Apr 18, 2024

Got the same error. I tried to set PYTORCH_ENABLE_MPS_FALLBACK=1, but no luck. I am running M3 macbook air.

@clearsitedesigns
Copy link
Author

Attempting a few more things to see what might happen, hopefully, someone from the team can respond if this is just supposed to run on Nvida / windows

@xxxAleksandrxxx
Copy link

+1
The same error on MacBook Pro M1

@lananelson
Copy link

+1 on m2

@shbfy
Copy link

shbfy commented Apr 22, 2024

+1

@iTheSailor
Copy link

iTheSailor commented Apr 22, 2024

Attempting a few more things to see what might happen, hopefully, someone from the team can respond if this is just supposed to run on Nvida / windows

thats a negative, i actually run into the same issue when i try to run it on windows. Windows Subsystem for Linux (WSL) works fine though. one thing to check would be to see if you have the proper CPU only installation of torch. i dont think GPU support is there for mac.

@ccozad
Copy link

ccozad commented Apr 24, 2024

I identified the code that is forcing nccl in my issue, #132 One of the first things Llama.build() does is initialize torch distributed with a hard coded nccl initialization. https://pytorch.org/docs/stable/distributed.html

@Jiayu-Tian
Copy link

+1 on M3 Pro

@davemw15
Copy link

davemw15 commented May 9, 2024

Feel free to correct me if I'm wrong... but I'm pretty sure there is no official support for ARM Macs (M-series chips).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants