Trouble loading ViT - Dino structure for channels>3? #291

AgentM-GEG · 2023-12-22T14:37:29Z

Hi,

I am trying to do a ViT + Dino framework example illustrated in the repository, with slightly changed parameters (channels=4 and image size = 224). I found that the example works as expected when channels=3 and fails with a Runtime Error RuntimeError: Given normalized_shape=[4096], expected input with shape [*, 4096], but got input of size[2, 49, 3072]. I feel like something has been hardcoded within dino.py that is causing this issue. Please suggest any changes/recommendations. I feel like I am missing something obvious here.

EDIT: I think it may be because of this https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/dino.py#L249

my_model = ViT(
    image_size = 224,
    patch_size = 32,
    num_classes = 1000,
    dim = 1024,
    depth = 6,
    heads = 8,
    mlp_dim = 2048,
    channels=4
)

learner_model = Dino(
    my_model,
    image_size = 224,
    hidden_layer = 'to_latent',        # hidden layer name or index, from which to extract the embedding
    projection_hidden_size = 256,      # projector network hidden dimension
    projection_layers = 4,             # number of layers in projection network
    num_classes_K = 50176,             # output logits dimensions (referenced as K in paper)
    student_temp = 0.9,                # student temperature
    teacher_temp = 0.04,               # teacher temperature, needs to be annealed from 0.04 to 0.07 over 30 epochs
    local_upper_crop_scale = 0.4,      # upper bound for local crop - 0.4 was recommended in the paper 
    global_lower_crop_scale = 0.5,     # lower bound for global crop - 0.5 was recommended in the paper
    moving_average_decay = 0.9,        # moving average of encoder - paper showed anywhere from 0.9 to 0.999 was ok
    center_moving_average_decay = 0.9, # moving average of teacher centers - paper showed anywhere from 0.9 to 0.999 was ok
)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trouble loading ViT - Dino structure for channels>3? #291

Trouble loading ViT - Dino structure for channels>3? #291

AgentM-GEG commented Dec 22, 2023 •

edited

Trouble loading ViT - Dino structure for channels>3? #291

Trouble loading ViT - Dino structure for channels>3? #291

Comments

AgentM-GEG commented Dec 22, 2023 • edited

AgentM-GEG commented Dec 22, 2023 •

edited