Igor lig 4447 w mse benchmark #1474

IgorSusmelj · 2024-01-11T20:11:08Z

Changes

Adds WMSE ImageNet benchmark
Adds missing projection head

codecov · 2024-01-11T20:15:16Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (2b215aa) 85.50% compared to head (7ad8205) 85.45%.
Report is 1 commits behind head on master.

Files	Patch %	Lines
lightly/loss/wmse_loss.py	64.28%	5 Missing ⚠️
lightly/models/modules/heads.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1474      +/-   ##
==========================================
- Coverage   85.50%   85.45%   -0.05%     
==========================================
  Files         135      135              
  Lines        5657     5672      +15     
==========================================
+ Hits         4837     4847      +10     
- Misses        820      825       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

IgorSusmelj · 2024-01-12T07:52:35Z

Seems to be training.

philippmwirth · 2024-01-12T15:20:59Z

benchmarks/imagenet/resnet50/wmse.py

+
+        # we use a projection head with output dimension 64
+        # and w_size of 128 to support a batch size of 256
+        self.projection_head = WMSEProjectionHead(output_dim=64)


I think the output dimension is wrong here. From the paper:

Finally, we use an embedding size
of 64 for CIFAR-10 and CIFAR-100, and an embedding of
size of 128 for STL-10 and Tiny ImageNet. For ImageNet-
100 we use a configuration similar to the Tiny ImageNet
experiments, and 240 epochs of training. Finally, in the
ImageNet experiments (Tab. 3), we use the implementation
and the hyperparameter configuration of (Chen et al., 2020b)
(same number of layers in the projection head, etc.) based
on their open-source implementation2, the only difference
being the learning rate and the loss function (respectively,
0.075 and the contrastive loss in (Chen et al., 2020b) vs. 0.1
and Eq. 6 in W-MSE 4

So they're using a SimCLR2 projection head.

And most likely the embedding dim is the same as the one for SimCLR2.

philippmwirth · 2024-01-12T15:23:59Z

benchmarks/imagenet/resnet50/wmse.py

+        self.projection_head = WMSEProjectionHead(output_dim=64)
+
+        self.criterion_WMSE4loss = WMSELoss(
+            w_size=128, embedding_dim=64, num_samples=4, gather_distributed=True


For ImageNet they probably use w_size=256:

For CIFAR-10
and CIFAR-100, the slicing sub-batch size is 128, for Tiny
ImageNet and STL-10, it is 256

philippmwirth · 2024-01-12T15:25:42Z

benchmarks/imagenet/resnet50/wmse.py

+                    "weight_decay": 0.0,
+                },
+            ],
+            lr=0.1 * math.sqrt(self.batch_size_per_device * self.trainer.world_size),


The denominator is missing here.

philippmwirth · 2024-01-12T15:29:16Z

lightly/loss/wmse_loss.py

@@ -59,10 +61,18 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:

        f_cov_shrinked = (1 - self.eps) * f_cov + self.eps * eye

+        # get type of f_cov_shrinked and temporary convert to full precision
+        # to support chelosky decomposition
+        f_cov_shrinked_type = f_cov_shrinked.dtype


This looks super duper hacky. Why is it necessary?

Yes, as written in the comment. The original code is not using half precision.

philippmwirth · 2024-01-12T15:31:36Z

lightly/loss/wmse_loss.py

+        if self.gather_distributed and dist.is_initialized():
+            world_size = dist.get_world_size()
+            if world_size > 1:
+                input = torch.cat(gather(input), dim=0)


Are you sure this is correct? Intuitively I think there could be problems because now every device computes the exact same loss, right?

I removed it but I will add it again. That seems the most easy and proper way to support multi-GPU training. I'll make sure we divide the loss by the number of devices to make runs more comparable between different multi-gpu setups.

philippmwirth · 2024-01-12T15:31:52Z

lightly/models/modules/heads.py

@@ -699,6 +699,27 @@ def __init__(
        )


+class WMSEProjectionHead(SimCLRProjectionHead):


I don't think we need this. We should be able to use SimCLR instead.

@guarin, we should make sure things are consistent. I'm not sure what we agreed on. AFAIK, The same goes for the transforms.

Aren't the default values different?

In any case, I prefer if all components of the WMSE model are called WMSESomething. Mixing components from different models is always confusing and it makes the components harder to discover in the code. If two models have the same head then we can just subclass from the first model and update the docstring.

guarin · 2024-01-19T08:40:43Z

lightly/models/modules/heads.py

+        num_layers: int = 2,
+        batch_norm: bool = True,
+    ):
+        super(WMSEProjectionHead, self).__init__(


Suggested change

super(WMSEProjectionHead, self).__init__(

super().__init__(

In general the class should not be passed to the super method.

IgorSusmelj added 2 commits January 11, 2024 21:10

Add projection head

c003c32

Add ImageNet benchmark code for wmse

d84b0e5

IgorSusmelj added 4 commits January 11, 2024 21:17

Add wmse model to main script

753cd5c

Fix head

6d53a0a

Add better error messages. Support mixed precision and distributed.

dc99515

Update code

8b19156

IgorSusmelj marked this pull request as ready for review January 12, 2024 07:52

philippmwirth reviewed Jan 12, 2024

View reviewed changes

IgorSusmelj added 5 commits January 15, 2024 08:30

Update comment to make clear why we have to use fp32

827c8af

Use parmeters from paper

e8b9d7e

Remove distributed gathering

f5a913c

Add W-MSE results

d4457dc

Add W-MSE results

a798dcb

guarin reviewed Jan 19, 2024

View reviewed changes

IgorSusmelj added 2 commits January 21, 2024 15:41

Add distributed gather again

0fbde50

Update code to match ImageNet experiments

7ad8205

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Igor lig 4447 w mse benchmark #1474

Igor lig 4447 w mse benchmark #1474

IgorSusmelj commented Jan 11, 2024

codecov bot commented Jan 11, 2024 •

edited

IgorSusmelj commented Jan 12, 2024

philippmwirth Jan 12, 2024

philippmwirth Jan 12, 2024

philippmwirth Jan 12, 2024

philippmwirth Jan 12, 2024

philippmwirth Jan 12, 2024

IgorSusmelj Jan 19, 2024

philippmwirth Jan 12, 2024

IgorSusmelj Jan 19, 2024

philippmwirth Jan 12, 2024

IgorSusmelj Jan 19, 2024

guarin Jan 19, 2024

guarin Jan 19, 2024

		@@ -699,6 +699,27 @@ def __init__(
		)


		class WMSEProjectionHead(SimCLRProjectionHead):

Igor lig 4447 w mse benchmark #1474

Are you sure you want to change the base?

Igor lig 4447 w mse benchmark #1474

Conversation

IgorSusmelj commented Jan 11, 2024

Changes

codecov bot commented Jan 11, 2024 • edited

Codecov Report

IgorSusmelj commented Jan 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 11, 2024 •

edited