New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some weights of OtterForConditionalGeneration were not initialized from the model #270
Comments
May I know your task type and which version of Otter model you are using for initialization? |
I am doing a classification task, with multiple images and a single prompt as input, in SD data set format, and the pre-training weights are "OTTER-Image-MPT7B" |
export PYTHONPATH=. accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml |
Will the missing weights log appear after you directly load the model? You may breakpoint after loading process finished. |
When loading the pre-trained weights you posted and the baseline weights I trained, there will be no log loss of weights, but there will be when loading the newly trained model weights. |
By the way, due to network problems, the network cannot download tokenizer_config.json from huggingface's MPT, so I downloaded it offline through "https://huggingface.co/mosaicml/mpt-7b-instruct", except for the bin file and in modeling_otter.py The modified code is text_tokenizer = AutoTokenizer.from_pretrained("/mnt/train_pipeline-master/Otter/mpt-7b-instruct") |
Could you download the model's config from this path? https://openxlab.org.cn/models/detail/YuanhanZhang/OTTER-Image-MPT7B The
And also, make sure you use the
|
The missing of And now you can try And then That will automatically handle to loading of |
I compared the config.json. Except for "_name_or_path" and "transformers_version", the rest are consistent with what you posted. This should not be the problem. { |
I checked the save_pretrained part as you said, I'm using the version about a month ago, the save code is as follows" |
I would suggest you to use the You can also load it using So this process would be safer and wont cause the missing weights problems. |
Got it, I'll take your advice and try it. |
Hey, I used to convert the trained weights final_weights.pt through otter/converting_otter_pt_to_hf.py, and then load the weights from pretrained. Please tell me if this is correct? I found that when converting weights, the effect of using the config.json you posted and the config.json generated by training seems to be the same. Is there a difference? |
It could be right. If you confirm that the |
Got it, the generated config.json only has "_name_or_path" and ""transformers_version"" different from what you posted. |
Thank you very much for your careful answer. I have solved this bug. The cause of the problem is that the "_name_or_path" of the generated config.json is derived from the parameter "pretrained_model_name_or_path" during training, but during inference it seems that "_name_or_path" requires " flamingo" field, so using the config.json you posted instead of the generated config.json is valid. |
Hey, I would like to ask you that I am doing a two-classification task with single prompt and multiple images as input, but the result does not seem to be very good. Do you have any ideas for possible improvements? Currently I plan to try to unfreeze the visual encoder. Hope to share your suggestions. |
If you are doing with multiple images as input. You could first try to arrange them into the For this model training, I suggest you to add a You can still init from Image model, if with the This is like treating your input images as video sequences. |
Also, another way is to put images in the
If doing so, you wont need to add above mentioned |
Or you could use Inside this customized config, you can choose whether give the |
Thanks for sharing. I am now training Otter to treat multiple pictures as videos. Since the number of pictures is of variable length, batchsize=1 is currently used for processing. Later I will try to set the maximum number of frames to increase the batchsize and see if it will improve results. |
any results for your multi-image input experiments? i'm planning to do similar things and wondering if you have any insights of which approach is better. |
Hello, I encountered such an output when testing the trained weights. I spent a long time trying to find out the reason. Unfortunately, I haven't found out the cause of this problem yet. Can you help me?
I once used official weights to train a baseline on my own data for classification. The results were not very good, but the prompt "Some weights of OtterForConditionalGeneration were not initialized, and are newly initialized" did not appear. When I trained another version This situation occurred after testing the model.
Loading checkpoint shards: 100%|██████████████████| 4/4 [00:30<00:00, 7.62s/it]
Some weights of OtterForConditionalGeneration were not initialized from the model checkpoint at /mnt/large_model/weights/BC4-partScale-negAug3 and are newly initialized: ['vision_encoder.vision_model.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The text was updated successfully, but these errors were encountered: