Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoCa RoBERTa Attention Map Size Issue #864

Open
sandeepmukh opened this issue Apr 18, 2024 · 1 comment
Open

CoCa RoBERTa Attention Map Size Issue #864

sandeepmukh opened this issue Apr 18, 2024 · 1 comment

Comments

@sandeepmukh
Copy link

Hi! I'm trying to train CoCa using the pretrained RoBERTa weights (has the casual masking issue #445 been addressed?), but I am running into an error with the Attention Maps sizes. Any help would be greatly appreciated :).

Below is the command I'm running:

torchrun --nproc_per_node 4 m training.main \
         --train-data="$COYO_PATH/train" \
         --train-num-samples 3000000 \
         --val-data="$COYO_PATH/val" \
         --val-num-samples 10000 \
         --dataset-type webdataset \
         --batch-size 128 \
         --warmup 2000 \
         --epochs 100 \
         --lr 5e-4 \
         --precision amp \
         --workers 6 \
         --model "coca_roberta-ViT-B-32" \
         --name "coca_coyo" \
         --report-to "wandb" \
         --wandb-project-name "open-clip-baseline" \
         --imagenet-val "$IMAGENET_HOME/validation" \
         --gather-with-grad \
         --local-loss \

However, this errors:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "src/training/main.py", line 508, in <module>
    main(sys.argv[1:])
  File "src/training/main.py", line 436, in main
    train_one_epoch(model, data, loss, epoch, optimizer, scaler, scheduler, dist_model, args, tb_writer=writer)
  File "src/training/train.py", line 101, in train_one_epoch
    model_out = model(images, texts)
 ... (omitted for brevity)
  File ".venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File ".venv/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1241, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File ".venv/lib/python3.10/site-packages/torch/nn/functional.py", line 5354, in multi_head_attention_forward
    raise RuntimeError(f"The shape of the 2D attn_mask is {attn_mask.shape}, but should be {correct_2d_size}.")
RuntimeError: The shape of the 2D attn_mask is torch.Size([76, 76]), but should be (77, 77).

Inspecting the error, I tried to change the multi-modal context length to 77, which yields the following error:

../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [38,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
@rwightman
Copy link
Collaborator

@sandeepmukh
I think a few things wrong for this .... first, update to main branch.

Then, I think this is needed in CocaModel to replace current vocab_size logic btw text and multimodal text towers

        if getattr(text_cfg, "hf_model_name", None) is not None:
            vocab_size = getattr(self.text, "vocab_size", text_cfg.vocab_size)
        else:
            vocab_size = text_cfg.vocab_size

Also, the context_len used by tokenzier sources from text_cfg by default, so text_cfg and multimodal_cfg should have same context_len values in config (I think) to work best but I'm not 100% sure there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants