Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

bryant1410 · 2023-12-23T03:55:14Z

Hey, I believe that supporting to run "openai" pre-trained checkpoint with non-QuickGELU models (e.g., RN50 and ViT-B-32) leads to bugs. The pattern is the following:

Fine-tune an OpenAI-pre-trained CLIP model (e.g., with --model ViT-B-32 --pretrained openai) inadvertently using a non-QuickGELU model, which is fine because it's hardcoded to use QuickGELU anyway.
Use the same command to run the evaluation but change openai for the fine-tuned model path.

What will happen is that the native GELU will be used instead of QuickGELU (the latter was used to train the model), and wrong results will be obtained.

This happened to me, as well as others (though there's a pending confirmation from them):

Would it be possible to fix/avoid this error-prone pattern? I see some ways:

Not allowing the running of non-QuickGELU models with the "openai" pre-trained checkpoint. Maybe it can be detected with a special error message (pointing to this issue), inviting the user to use the correct model variant.
Give a warning to the user when "openai" is used without the QuickGELU model (or without the --force-quick-gelu flag).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

bryant1410 commented Dec 23, 2023

Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

Comments

bryant1410 commented Dec 23, 2023