Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"error" in training - AttributeError: 'CastOutputToFloat' object has no attribute 'weight', RuntimeError: Only Tensors of floating point and complex dtype can require gradients #29

Open
GreenTeaBD opened this issue Mar 28, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@GreenTeaBD
Copy link

GreenTeaBD commented Mar 28, 2023

WSL2 Ubuntu, new install, I get the following error after it downloads the weights and tries to train.
Sorry I can't give more details, but I'm really not sure what's going on.

Number of samples: 534
Traceback (most recent call last):
File "/home/ckg/.local/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "/home/ckg/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api
result = await self.call_function(
File "/home/ckg/.local/lib/python3.10/site-packages/gradio/blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/ckg/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/ckg/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/ckg/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/ckg/.local/lib/python3.10/site-packages/gradio/helpers.py", line 587, in tracked_fn
response = fn(*args)
File "/home/ckg/github/simple-llama-finetuner/main.py", line 164, in tokenize_and_train
model = peft.prepare_model_for_int8_training(model)
File "/home/ckg/.local/lib/python3.10/site-packages/peft/utils/other.py", line 72, in prepare_model_for_int8_training
File "/home/ckg/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'CastOutputToFloat' object has no attribute 'weight'

@lxe lxe added the bug Something isn't working label Mar 28, 2023
@lxe
Copy link
Owner

lxe commented Mar 28, 2023

What hardware are you running on? Any other console traceback?

@GreenTeaBD
Copy link
Author

GreenTeaBD commented Mar 29, 2023

I just figured out an important thing while I was typing this comment. I'll include what I was writing before at the end but it appears to finetune correctly if I kill it and run main.py again. If I start finetuning once and then abort, I will get that error on every other attempt. I don't know if it actually will finetune successfully because it's currently running, but yeah, that seems like it's the problem, errors out on a 2nd attempt after an abort.

I did try deleting the leftover directory after the abort, in case that was getting in the way, but that didn't seem to do it. Much less of a problem now since killing the script and restarting isn't a big hassle but, still, probably not running as expected.

Anyway, the original comment:
Windows 11 (but in WSL2), otherwise working WSL install for diffusers and GPT-Neo finetuning (but, in their own environments)
A 5900x, 32GB ram, 100GB swap for WSL2 (I needed a lot for GPT-Neo stuff), and a 4090 for the gpu.

Nothing else to the traceback, everything else in the console is, what I think is, just normal stuff before
(llama-finetuner) ckg@Ryzen9:~/github/simple-llama-finetuner$ python main.py

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: CUDA runtime path found: /home/ckg/anaconda3/envs/llama-finetuner/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/ckg/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Loading base model...
Loading base model...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.

In case it helps, the output of nvcc -V is
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

And going through my bash history it looks like I installed everything with
git clone https://github.com/lxe/simple-llama-finetuner
conda create -n llama-finetuner python=3.10
conda activate llama-finetuner
conda install -y cuda -c nvidia/label/cuda-11.7.0
conda install -y pytorch=1.13.1 pytorch-cuda=11.7 -c pytorch
cd simple-llama-finetuner/
pip install -r requirements.txt
python main.py

@GreenTeaBD
Copy link
Author

Also, quick question that doesn't need its own issue. What's the significance of the 2 empty spaces in between each entry in the training data?

I had finetuned GPT-Neo a bunch, and I'm trying to wrap my head around the differences. And I haven't been able to find out if finetuning LLaMA uses <|endoftext|> tokens or not, or if there's another way to do it. Is that what the two empty lines are adding?

@lxe
Copy link
Owner

lxe commented Mar 31, 2023

Also, quick question that doesn't need its own issue. What's the significance of the 2 empty spaces in between each entry in the training data?

If you want to have newlines or empty lines in each of your samples, it helps. It's just the easiest way to format samples.

@lxe
Copy link
Owner

lxe commented Apr 6, 2023

I just rewrote the whole thing. Still seeing the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants