Fine-tune other models #8

Gincioks · 2023-10-23T03:34:46Z

Hello,

Can we apply this method to fine-tune models other than llamas and codellama, such as mistral 7b?

Many thanks in advance!

okuvshynov · 2023-10-23T18:00:48Z

That should be possible in principle, but some of the code might be model specific now. Could you point me to the

weights for the model you are interested in
some other implementation (maybe on huggingface?) of that model?

I could look into that.

okuvshynov · 2023-10-23T18:24:38Z

https://github.com/mistralai/mistral-src/blob/main/mistral/model.py this one?

Gincioks · 2023-10-23T18:28:23Z

I'm relatively new to AI development, but I've interested in a fine-tuned version of Mistral Orca. It's available here: Mistral 7B OpenOrca on Hugging Face. However, it seems like this model is in a Hugging Face format, which may not be directly compatible with the code, yes?

You can find the original weights for the Mistral 7B model here: Original Weights for Mistral 7B.

Gincioks · 2023-10-23T18:30:58Z

I tried to find a method for converting hf weights to pytorch, but nothing came up.

Gincioks · 2023-10-23T18:37:00Z

https://github.com/mistralai/mistral-src/blob/main/mistral/model.py this one?

Yes

okuvshynov · 2023-10-23T19:27:06Z

Looking at https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/pytorch_model.bin.index.json it should be possible to modify the loading to make it work. Need some updates to the loader code though.

Gincioks · 2023-10-24T04:48:52Z

Do you have any suggestions for getting started? I want to put this into action, despite the fact that there will be a lot to learn :D

okuvshynov · 2023-10-25T13:01:51Z

@Gincioks - I'm not entirely sure about the best way, but probably here's how I'd do it:

Download mistral model
Download their reference implementation
Try to load it and continue some prompt to check that it works (without slowllama, just their reference)
If it works, we can try importing some of it to slowllama
First step is loading - it will definitely require changes to loader, maybe to model as well. It should be ok to break things at this point - just make it work with new model, and we can decide on how to generalize this - what needs to be configurable, etc.
Then we need to make sure forward pass works. Compare the output we get here with the one we get from reference implementation
After that backwards pass should be straightforward.
Thank you for looking into this!

Gincioks · 2023-11-02T11:30:37Z

Currently I facing with this error:
File "slowllama/models_manager.py", line 76, in prepare_model prepare_mistal_model( File "slowllama/mistral/mistral_loader.py", line 114, in prepare_mistal_model apply_subset(submodule, weight_subset, ci, title) File "slowllama/mistral/mistral_loader.py", line 53, in apply_subset module.weight[idx_subset] = weight_subset ~~~~~~~~~~~~~^^^^^^^^^^^^

RuntimeError: The expanded size of the tensor (11008) must match the existing size (14336) at non-singleton dimension 0. Target sizes: [11008, 4096]. Tensor sizes: [14336, 4096]
when trying to prepare model. Any Thoughts?

Update:
I was able prepare model and launch inference thought your code. I needed change FeedForward class. But now I have problem that model gives random tokens. It can still be a probkem with orward pass

okuvshynov · 2023-11-02T14:51:30Z

could you share your code somewhere? Maybe a branch in your forked repo?

Gincioks · 2023-11-02T18:40:16Z

Yes, yes, I will share the code, I made too many changes so I will start new repo. Also I was able get generation working perfectly. Now will do the same with finetuning.

okuvshynov · 2023-11-02T19:02:58Z

yeah, i think doing that in the forked version might be good option. thank you for looking into this!

Gincioks · 2023-11-05T17:38:46Z

Hey, this is a new repository: https://github.com/Gincioks/PicoTuner. I intend to utilize this as a package in another project, so I created a small cli for easier use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tune other models #8

Fine-tune other models #8

Gincioks commented Oct 23, 2023

okuvshynov commented Oct 23, 2023

okuvshynov commented Oct 23, 2023

Gincioks commented Oct 23, 2023 •

edited

Gincioks commented Oct 23, 2023

Gincioks commented Oct 23, 2023

okuvshynov commented Oct 23, 2023

Gincioks commented Oct 24, 2023

okuvshynov commented Oct 25, 2023

Gincioks commented Nov 2, 2023 •

edited

okuvshynov commented Nov 2, 2023

Gincioks commented Nov 2, 2023

okuvshynov commented Nov 2, 2023

Gincioks commented Nov 5, 2023

Fine-tune other models #8

Fine-tune other models #8

Comments

Gincioks commented Oct 23, 2023

okuvshynov commented Oct 23, 2023

okuvshynov commented Oct 23, 2023

Gincioks commented Oct 23, 2023 • edited

Gincioks commented Oct 23, 2023

Gincioks commented Oct 23, 2023

okuvshynov commented Oct 23, 2023

Gincioks commented Oct 24, 2023

okuvshynov commented Oct 25, 2023

Gincioks commented Nov 2, 2023 • edited

okuvshynov commented Nov 2, 2023

Gincioks commented Nov 2, 2023

okuvshynov commented Nov 2, 2023

Gincioks commented Nov 5, 2023

Gincioks commented Oct 23, 2023 •

edited

Gincioks commented Nov 2, 2023 •

edited