New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssertionError while finetuning RWKVv5 #216
Comments
Data has 200499 tokens therefore set my_exit_tokens to 200499, and note: therefore set magic_prime = 389 |
Thanks for answering.But still some errors occur:
|
for 0.4B finetuning, set: |
set --devices 4 to use 4 GPU CUDA_VISIBLE_DEVICES=0,1,2,3 |
Thanks again @BlinkDL I have another question I'd like to ask: Currently, I'm using a context length (ctx_len) of 1024 for full fine-tuning a model with only 0.4B parameters, specifically rwkv5, but it's almost maxing out the memory on all four of my A10 GPUs. However, llama2-7b can run full-scale on four A10 cards with a context length of 4096. Is there a way I can enable my v5 model to run full-scale training with a context length of 4096 using model parallelism across four GPUs? |
Check your "gradient checkpoint" flag, disabling gives a speed boost, for much more VRAM usage (llama typically have that set to true) |
@Ethan-Chen-plus set GRAD_CP=1 |
While finetuning RWKV, I use this script(using demo dataset by
make_data.py
and putdemo.bin
anddemo.idx
in./data
):I caught this error:
The text was updated successfully, but these errors were encountered: