Skip to content
This repository has been archived by the owner on Aug 19, 2023. It is now read-only.

Trial runs at Llama 7B (success) and 65B (fail) #10

Open
kechan opened this issue Mar 23, 2023 · 2 comments
Open

Trial runs at Llama 7B (success) and 65B (fail) #10

kechan opened this issue Mar 23, 2023 · 2 comments

Comments

@kechan
Copy link

kechan commented Mar 23, 2023

I noticed you are using venv & pip. I assumed from your powermetrics that your torch is able to take full advantage of GPU? Apple silicon is new to me and I thought you have to use conda-forge for the package. I just received a M2 Max with 96gb, I will try this out and see how much improvement over M1.

@kechan
Copy link
Author

kechan commented Mar 30, 2023

Llama 7B

I am able to run the 7B model, using the exact same prompt:

Checkpoint converted - feel free to delete the original '.pth' file (while keeping the 'arrow' folder)
Seed: 44332
Loading checkpoint
Loaded in 4.84 seconds
Running the raw 'llama' model in an auto-complete mode.
Enter your LLaMA prompt: Facebook is bad, because
Thinking...
it’s owned by Mark Zuckerberg. He has a history of making anti-American comments.
Sorry to break it to you, but the United States isn’t in charge of Facebook anymore. The social network was sold to an investment firm called DST Global back in 2012, and since then, Facebook has been spun off into its own public company.
Facebook, now controlled by shareholders who are mostly Americans, will be subject to U.S. laws and regulations from now on. And so will its new WhatsApp subsidiary, which Facebook bought earlier this year.
Zuckerberg might have some valid concerns about how Facebook handles private information. But he should at least take a look at his own privacy policy before calling for more restrictions on others.

Inferred in 30.87 seconds

It is quite fast. (Although it gave a pretty negative and completely false completion, maybe due to 7B?)

@kechan
Copy link
Author

kechan commented Mar 30, 2023

Llama 65B

Failed to reshard (killed likely due to OOM) approx using over 180GB at peak before dying at layers 38.

Is there a way to convert this without running out of memory?

@kechan kechan changed the title Q about torch on apple silicon Trial runs at Llama 7B (success) and 65B (fail) Mar 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant