Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about using COCa to generate captions #797

Open
ykj467422034 opened this issue Jan 16, 2024 · 17 comments
Open

Questions about using COCa to generate captions #797

ykj467422034 opened this issue Jan 16, 2024 · 17 comments

Comments

@ykj467422034
Copy link

I'm finetuning OpenCLIP on my own csv dataset. Then I output the check_point file, and then use the official code to generate captions. However, the generated captions are always being generated repeatedly. Is there anyone who can help me solve this problem?
Finetuning
python -m training.main \ --dataset-type "csv" \ --train-data "my-csv/coca_train.csv" \ --warmup 1000 \ --batch-size 32 \ --lr 1e-5 \ --wd 0.1 \ --epochs 1 \ --workers 3 \ --model "coca_ViT-L-14" \ --report-to "wandb" \ --coca-contrastive-loss-weight 0 \ --coca-caption-loss-weight 1 \ --log-every-n-steps 100
Test
`import open_clip
import torch
from PIL import Image

model, _, transform = open_clip.create_model_and_transforms(
model_name="coca_ViT-L-14",
pretrained="logs/check_point.pth"
)

im = Image.open("cat.jpg").convert("RGB")
im = transform(im).unsqueeze(0)

with torch.no_grad(), torch.cuda.amp.autocast():
generated = model.generate(im)

print(open_clip.decode(generated[0]).split("<end_of_text>")[0].replace("<start_of_text>", ""))
`
Result
image
As you can see, the captions generated by different pictures are the same.

@ykj467422034
Copy link
Author

@gpucce
Copy link
Contributor

gpucce commented Jan 16, 2024

Hi, @ykj467422034 can you share a snippet of the code you are actually using?
From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

@ykj467422034
Copy link
Author

Hi, @ykj467422034 can you share a snippet of the code you are actually using? From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

This is what I actually use, because I want to generate a caption, but the key is that they are repeated.

@gpucce
Copy link
Contributor

gpucce commented Jan 16, 2024

Hi, @ykj467422034 can you share a snippet of the code you are actually using? From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

This is what I actually use, because I want to generate a caption, but the key is that they are repeated.

So it generates the captions you are showing for the "cat.jpg" file?

@ykj467422034
Copy link
Author

Hi, @ykj467422034 can you share a snippet of the code you are actually using? From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

This is what I actually use, because I want to generate a caption, but the key is that they are repeated.

So it generates the captions you are showing for the "cat.jpg" file?

No, I know your meanings.
imageThere are 100 images and I generate it picture by picture.

@gpucce
Copy link
Contributor

gpucce commented Jan 16, 2024

@ykj467422034 sorry didn´t see your reply, so it repeats the same caption for different images or is generating several captions for one image?

Also did you try and generate a caption for a random tensor?

@ykj467422034
Copy link
Author

@ykj467422034 sorry didn´t see your reply, so it repeats the same caption for different images or is generating several captions for one image?

Also did you try and generate a caption for a random tensor?

The former. repeat captions

@gpucce
Copy link
Contributor

gpucce commented Jan 16, 2024

Mmmh not sure, I asked about the random tensor to see if the model generates the same caption also in that case, if that is so, maybe fine-tuning didn´t go well. Do you get a similar behaviour with the pretrained model?

@ykj467422034
Copy link
Author

嗯,不确定,我问了随机张量,看看模型是否也在这种情况下生成相同的标题,如果是这样,也许微调不顺利。在预训练模型中,您是否有类似的行为?
Haven't,I can try this latter. But, thank you very much

@Thomas2419
Copy link

Thomas2419 commented Jan 16, 2024

Hello, @ykj467422034 , I haven't check if the most recent update has fixed this issue so this suggestion might not work and in fact it might screw everything up so this is my warning to you, but assuming it hasn't I will refer you to issue #751 The problem was that after coca finetuning the model, all of it's predictions were all repetitions of the same word.

For example in the issue it was "turnpike turnpike turnpike turnpike parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway".

The solution I found to work for me as I described in issue #751 was to git pull the open_clip repository, and then edit my local files in open_clip/src/open_clip/coca_model.py the lines as exactly specified line per line in Pull Request #710 by gpucce, and then ran pip install -e .
in the repository's main directory to install it post edits. This completely fixed my problem and made training function as desired for me.

@ykj467422034
Copy link
Author

您好,我还没有检查最近的更新是否解决了这个问题,所以这个建议可能不起作用, 实际上它可能会搞砸一切,所以这是我给你的警告,但假设它没有,我会向你推荐问题#751问题是,在古柯微调之后,模型预测都是同一个词的重复。

例如,在问题中,它是“收费公路收费公路收费公路公园大道

正如我在问题 #751 中描述的那样,我发现对我有用的解决方案是 git 拉取 open_clip 存储库,然后在 open_clip/src/open_clip/coca_model.py 中编辑我的本地文件,按照 gpucce 在拉取请求 #710 中每行精确指定的行,然后在存储库的主目录中运行以在编辑后安装它。这完全解决了我的问题,并使训练功能符合我的需要。pip install -e .

I edited it as you say,
but the repeat captions still exist

@Thomas2419
Copy link

Are you using the newest branch? I was not so perhaps that is impacting the edit's success.

@ykj467422034
Copy link
Author

Are you using the newest branch? I was not so perhaps that is impacting the edit's success.

Do you mean open_clip repository or modified src files?

@Thomas2419
Copy link

Apologies for my lack of clarity, I mean the open_clip repository, I was using the most up to date version at the time, but it looks like there have been multiple new commits made to it since then. I am currently unable to access mine to check which commit I am using though.

@ykj467422034
Copy link
Author

Apologies for my lack of clarity, I mean the open_clip repository, I was using the most up to date version at the time, but it looks like there have been multiple new commits made to it since then. I am currently unable to access mine to check which commit I am using though.

Fine. Maybe I can try the latest version once more. Thanks

@gpucce
Copy link
Contributor

gpucce commented Jan 16, 2024

@ykj467422034 I think that with those changes you would still need to rerun the fine-tuning

@ykj467422034
Copy link
Author

@ykj467422034 I think that with those changes you would still need to rerun the fine-tuning

Sure, I will. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants