-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added code for Full fine tune #645
base: main
Are you sure you want to change the base?
Conversation
Do you perform your full fine-tune in float32? |
due to compatibility of 'tuner/trainer.py', I fixed save, load function in code. I did test fully. =) but some memory leak issue exist on memory handling on mlx-example or mlx, maybe?
Fixed load/save function. fully tested with Phi-2 2.8B model.
no. I just copy code from mlx-lm/lora.py and fixed to run as full fine-tune. to make compatible original tuner/* code. |
test example.
|
Tried training qwen-1.8b. NaN loss immediately. Will try phi-2. |
when I tried Gemma-2b, same NaN loss. maybe, it's foundation code issue. maybe in models/* ? I didn't check. |
Think its the float16. |
Just checked - NaN w/ phi. |
I also was receiving NaN using Qwen 14B against my dataset but couldn't reproduce with the test data in lora/data. Tried again with updates on main for both mlx/mlx_lm this morning and have reached 4K iterations so far w/out NaN's . In the past it had been a float16 issue for me. I don't remember if I quantized this one at 32 or 16, but the config.json of the locally converted model has:
|
I've opened an older issue (#620) regarding training error NaN values |
This is cool, and I think it would be nice to support. We might be able to do it with a far smaller diff however. Something like:
Everything else should be the same. Wdyt? |
The code presented here is derived from the original
lora.py
file, with minimal modifications. The primary addition is the inclusion of full fine-tuning functionality, while preserving the core structure of the original code. This revised version offers a potential starting point for testing the training process on more powerful Mac devices.Efforts were made to avoid altering any code within the
tuner/*
directory, ensuring that this update does not introduce any conflicts with the legacy codebase.The code has been successfully tested on a Mac M2 Studio model with 192GB of memory, demonstrating its compatibility with high-performance hardware configurations.