Add augmentation to USAVars Dataset from paper code base #1434

nilsleh · 2023-06-20T15:15:31Z

This PR adds the Resize Augmentation from the Paper code base found here: https://github.com/Global-Policy-Lab/mosaiks-paper/blob/master/code/analysis/1_feature_extraction/2_featurize_models_deep_pretrained.py

adamjstewart · 2023-06-20T16:40:53Z

Poking around the code, I also see:

Not sure which of these are actually run or the code just exists for.

@calebrob6 why did we call this dataset USAVars instead of MOSAIKS?

calebrob6 · 2023-06-20T23:32:53Z

MOSAIKS is the name of a method (Multi-task Observation using Satellite Imagery & Kitchen Sinks (MOSAIKS)) that can be applied generally. USAVars is a better name for a dataset.

nilsleh · 2023-06-21T08:01:17Z

Poking around the code, I also see:

I want to use this dataset for a project and am trying to reproduce the reported results they have with a lightning setup instead of their big custom code base and will report which augmentations are needed to reproduce their scores.

nilsleh · 2023-06-21T09:26:55Z

Computed Image statistics on torchgeo train dataset split:

min: array([0., 0., 0., 0.], dtype=float32)
max: array([1., 1., 1., 1.], dtype=float32)
mean: array([0.4101762, 0.4342503, 0.3484594, 0.5473533], dtype=float32)
std: array([0.17361328, 0.14048962, 0.12148701, 0.16887303], dtype=float32)

quiet different from the imagenet stats they use: mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]

nilsleh · 2023-06-21T11:46:28Z

More normalization

Whitening

I think those normalizations are unique to the MOSAIK model they use. But these are the augmentations for CNN based approach.

adamjstewart · 2023-06-21T15:32:07Z

In that case should we add RandomHorizontalFlip and ImageNet normalization?

nilsleh · 2023-06-22T09:13:02Z

yeah, I want to try and reproduce results first and will update the PR here then.

nilsleh · 2023-06-23T07:12:08Z

@calebrob6 do the train/val/test splits that come with the torchgeo dataset version, correspond to any of the checkerboard style splits as seen in Figure 3 of the Mosaik paper or are these random splits?

Additionally, target variable normalization is also relevant for regression tasks. This is done here in their code. Should we add this target variable normalization as well, or at least document the mean/std values somewhere so people don't have to compute these values themselves?

calebrob6 · 2023-06-23T15:09:36Z

I'm pretty sure they are random splits.

Also, it looks like the download isn't working (the storage account permissions were automatically switched from anonymous access to private), so I need to move this to huggingface.

Also, this isn't an exactly replication of their dataset as they used Google Earth imagery (I think) while this is NAIP imagery.

nilsleh · 2023-06-29T12:06:50Z

With a resnet18 baseline I get 0.95 R-Squared score for treecover (paper 0.91) when doing proper normalization. Since we cannot replicate their results directly anyway as Caleb pointed out, I would suggest to just use the computed normalization statistics on this dataset, and I think adding support for target value normalization would be good as well.

estherrolf · 2023-07-11T15:12:13Z

Hi, just saw this. Chiming in on a few things and please let me know if I can be helpful with anything else @nilsleh!

Yes we do target variable normalization as is standard for regression. Note also that some of the target variables are transformed as y_transformed = log(1+y) (and performance is then reported with respect to the logged variables).
As Caleb pointed out, the USAVars data here is based on NAIP imagery whereas the analysis in our paper is based on google imagery, so unfortunately don't expect the results to match up exactly with the numbers in the paper.
In light of ^, if choosing to resize the imagery during preprocessing (or not), there is likely going to be a different optimal patch size for the NAIP imagery than for the imagery we use in the paper.
It's possible that a different preprocessing of the images would be helpful for the CNN baseline or for MOSAIKS -- especially in light of these results: https://arxiv.org/abs/2305.13456. At the time of doing the experiments, we did what made the most sense for a solid and reasonable baseline: ZCA whitening for RCF (implemented here) following the explanation in footnote 14 here) and standard augmentation strategies for the Resnet-18 model as you've noted above.

adamjstewart · 2024-02-29T12:56:56Z

@nilsleh should we try to sneak this into v0.5.2?

add augmentation from code

7a3d95f

github-actions bot added the datamodules PyTorch Lightning datamodules label Jun 20, 2023

adamjstewart added this to the 0.4.2 milestone Jun 20, 2023

nilsleh changed the title ~~Add augmentation from paper code base~~ Add augmentation to USAVars Dataset from paper code base Jun 21, 2023

nilsleh marked this pull request as draft June 21, 2023 09:06

adamjstewart removed this from the 0.4.2 milestone Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add augmentation to USAVars Dataset from paper code base #1434

Add augmentation to USAVars Dataset from paper code base #1434

nilsleh commented Jun 20, 2023

adamjstewart commented Jun 20, 2023

calebrob6 commented Jun 20, 2023

nilsleh commented Jun 21, 2023 •

edited

nilsleh commented Jun 21, 2023

nilsleh commented Jun 21, 2023

adamjstewart commented Jun 21, 2023

nilsleh commented Jun 22, 2023

nilsleh commented Jun 23, 2023

calebrob6 commented Jun 23, 2023

nilsleh commented Jun 29, 2023 •

edited

estherrolf commented Jul 11, 2023

adamjstewart commented Feb 29, 2024

Add augmentation to USAVars Dataset from paper code base #1434

Are you sure you want to change the base?

Add augmentation to USAVars Dataset from paper code base #1434

Conversation

nilsleh commented Jun 20, 2023

adamjstewart commented Jun 20, 2023

calebrob6 commented Jun 20, 2023

nilsleh commented Jun 21, 2023 • edited

nilsleh commented Jun 21, 2023

nilsleh commented Jun 21, 2023

adamjstewart commented Jun 21, 2023

nilsleh commented Jun 22, 2023

nilsleh commented Jun 23, 2023

calebrob6 commented Jun 23, 2023

nilsleh commented Jun 29, 2023 • edited

estherrolf commented Jul 11, 2023

adamjstewart commented Feb 29, 2024

nilsleh commented Jun 21, 2023 •

edited

nilsleh commented Jun 29, 2023 •

edited