On m1 pro - "Distributed package doesn't have NCCL built in #37

clearsitedesigns · 2024-04-18T19:50:21Z

Must be something torch package, related...

This is when trying to run the command

torchrun --nproc_per_node 1 example_chat_completion.py
--ckpt_dir Meta-Llama-3-8B-Instruct/
--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model
--max_seq_len 512 --max_batch_size 6

kalun85 · 2024-04-18T20:28:49Z

Got the same error. I tried to set PYTORCH_ENABLE_MPS_FALLBACK=1, but no luck. I am running M3 macbook air.

clearsitedesigns · 2024-04-18T20:45:59Z

Attempting a few more things to see what might happen, hopefully, someone from the team can respond if this is just supposed to run on Nvida / windows

xxxAleksandrxxx · 2024-04-20T17:03:04Z

+1
The same error on MacBook Pro M1

lananelson · 2024-04-21T18:25:55Z

+1 on m2

shbfy · 2024-04-22T12:39:01Z

+1

iTheSailor · 2024-04-22T16:30:25Z

Attempting a few more things to see what might happen, hopefully, someone from the team can respond if this is just supposed to run on Nvida / windows

thats a negative, i actually run into the same issue when i try to run it on windows. Windows Subsystem for Linux (WSL) works fine though. one thing to check would be to see if you have the proper CPU only installation of torch. i dont think GPU support is there for mac.

ccozad · 2024-04-24T00:55:35Z

I identified the code that is forcing nccl in my issue, #132 One of the first things Llama.build() does is initialize torch distributed with a hard coded nccl initialization. https://pytorch.org/docs/stable/distributed.html

Jiayu-Tian · 2024-04-27T19:37:44Z

+1 on M3 Pro

davemw15 · 2024-05-09T23:06:27Z

Feel free to correct me if I'm wrong... but I'm pretty sure there is no official support for ARM Macs (M-series chips).

JamesHighsmith linked a pull request Apr 18, 2024 that will close this issue

Issue #37: WIP - M1 NCCL Error - Utilizing Llama2 M1 Bug Fix #44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On m1 pro - "Distributed package doesn't have NCCL built in #37

On m1 pro - "Distributed package doesn't have NCCL built in #37

clearsitedesigns commented Apr 18, 2024

kalun85 commented Apr 18, 2024 •

edited

clearsitedesigns commented Apr 18, 2024

xxxAleksandrxxx commented Apr 20, 2024

lananelson commented Apr 21, 2024

shbfy commented Apr 22, 2024

iTheSailor commented Apr 22, 2024 •

edited

ccozad commented Apr 24, 2024

Jiayu-Tian commented Apr 27, 2024

davemw15 commented May 9, 2024

On m1 pro - "Distributed package doesn't have NCCL built in #37

On m1 pro - "Distributed package doesn't have NCCL built in #37

Comments

clearsitedesigns commented Apr 18, 2024

kalun85 commented Apr 18, 2024 • edited

clearsitedesigns commented Apr 18, 2024

xxxAleksandrxxx commented Apr 20, 2024

lananelson commented Apr 21, 2024

shbfy commented Apr 22, 2024

iTheSailor commented Apr 22, 2024 • edited

ccozad commented Apr 24, 2024

Jiayu-Tian commented Apr 27, 2024

davemw15 commented May 9, 2024

kalun85 commented Apr 18, 2024 •

edited

iTheSailor commented Apr 22, 2024 •

edited