Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

You are using a model of type mini_gemini_mixtral to instantiate a model of type mini_gemini. This is not supported for all configurations of models and can yield errors. #63

Open
lightingvector opened this issue Apr 16, 2024 · 1 comment

Comments

@lightingvector
Copy link
Contributor

lightingvector commented Apr 16, 2024

I managed to finetune the mini-gemini mixtral model, however post finetuning I am unable to infer with the model. I tried to launch a model worker per described on the repo: python -m minigemini.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40001 --worker http://localhost:40001 --model-path Mini-Gemini-mixtral/

Then I get after a long wait:

You are using a model of type mini_gemini_mixtral to instantiate a model of type mini_gemini. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards:   0%|                                                                    | 0/36 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|                                                                    | 0/36 [00:00<?, ?it/s]
2024-04-16 23:00:00 | ERROR | stderr | 
2024-04-16 23:00:00 | ERROR | stderr | Traceback (most recent call last):
2024-04-16 23:00:00 | ERROR | stderr |   File "<frozen runpy>", line 198, in _run_module_as_main
2024-04-16 23:00:00 | ERROR | stderr |   File "<frozen runpy>", line 88, in _run_code
2024-04-16 23:00:00 | ERROR | stderr |   File "/home/paperspace/MiniGemini/minigemini/serve/model_worker.py", line 389, in <module>
2024-04-16 23:00:00 | ERROR | stderr |     worker = ModelWorker(args.controller_address,
2024-04-16 23:00:00 | ERROR | stderr |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr |   File "/home/paperspace/MiniGemini/minigemini/serve/model_worker.py", line 76, in __init__
2024-04-16 23:00:00 | ERROR | stderr |     self.tokenizer, self.model, self.image_processor, self.context_len = load_pretrained_model(
2024-04-16 23:00:00 | ERROR | stderr |                                                                          ^^^^^^^^^^^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr |   File "/home/paperspace/MiniGemini/minigemini/model/builder.py", line 76, in load_pretrained_model
2024-04-16 23:00:00 | ERROR | stderr |     model = MiniGeminiLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
2024-04-16 23:00:00 | ERROR | stderr |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr |   File "/home/paperspace/MiniGemini/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3706, in from_pretrained
2024-04-16 23:00:00 | ERROR | stderr |     ) = cls._load_pretrained_model(
2024-04-16 23:00:00 | ERROR | stderr |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr |   File "/home/paperspace/MiniGemini/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4091, in _load_pretrained_model
2024-04-16 23:00:00 | ERROR | stderr |     state_dict = load_state_dict(shard_file)
2024-04-16 23:00:00 | ERROR | stderr |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr |   File "/home/paperspace/MiniGemini/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 505, in load_state_dict
2024-04-16 23:00:00 | ERROR | stderr |     if metadata.get("format") not in ["pt", "tf", "flax"]:
2024-04-16 23:00:00 | ERROR | stderr |        ^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr | AttributeError: 'NoneType' object has no attribute 'get'

Could this be due to the conversion from zero to fp32 post training ?
I did run zero to fp32 but saved as sharded safetensors instead of a pytorch_model.bin.

@yanwei-li
Copy link
Member

Hi, please rename your finetuned model with the word "8x7b", which is used to load the mixtral model in L68 of model/builder.py. Or, you can just modify the loading regulation in L68 of model/builder.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants