Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

Open
bryant1410 opened this issue Dec 23, 2023 · 0 comments

Comments

@bryant1410
Copy link
Contributor

Hey, I believe that supporting to run "openai" pre-trained checkpoint with non-QuickGELU models (e.g., RN50 and ViT-B-32) leads to bugs. The pattern is the following:

  1. Fine-tune an OpenAI-pre-trained CLIP model (e.g., with --model ViT-B-32 --pretrained openai) inadvertently using a non-QuickGELU model, which is fine because it's hardcoded to use QuickGELU anyway.
  2. Use the same command to run the evaluation but change openai for the fine-tuned model path.

What will happen is that the native GELU will be used instead of QuickGELU (the latter was used to train the model), and wrong results will be obtained.

This happened to me, as well as others (though there's a pending confirmation from them):

Would it be possible to fix/avoid this error-prone pattern? I see some ways:

  • Not allowing the running of non-QuickGELU models with the "openai" pre-trained checkpoint. Maybe it can be detected with a special error message (pointing to this issue), inviting the user to use the correct model variant.
  • Give a warning to the user when "openai" is used without the QuickGELU model (or without the --force-quick-gelu flag).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant