-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add augmentation to USAVars Dataset from paper code base #1434
base: main
Are you sure you want to change the base?
Conversation
Poking around the code, I also see:
Not sure which of these are actually run or the code just exists for. @calebrob6 why did we call this dataset USAVars instead of MOSAIKS? |
MOSAIKS is the name of a method (Multi-task Observation using Satellite Imagery & Kitchen Sinks (MOSAIKS)) that can be applied generally. USAVars is a better name for a dataset. |
I want to use this dataset for a project and am trying to reproduce the reported results they have with a lightning setup instead of their big custom code base and will report which augmentations are needed to reproduce their scores. |
Computed Image statistics on torchgeo train dataset split:
quiet different from the imagenet stats they use: |
I think those normalizations are unique to the MOSAIK model they use. But these are the augmentations for CNN based approach. |
In that case should we add RandomHorizontalFlip and ImageNet normalization? |
yeah, I want to try and reproduce results first and will update the PR here then. |
@calebrob6 do the train/val/test splits that come with the torchgeo dataset version, correspond to any of the checkerboard style splits as seen in Figure 3 of the Mosaik paper or are these random splits? Additionally, target variable normalization is also relevant for regression tasks. This is done here in their code. Should we add this target variable normalization as well, or at least document the mean/std values somewhere so people don't have to compute these values themselves? |
I'm pretty sure they are random splits. Also, it looks like the download isn't working (the storage account permissions were automatically switched from anonymous access to private), so I need to move this to huggingface. Also, this isn't an exactly replication of their dataset as they used Google Earth imagery (I think) while this is NAIP imagery. |
With a resnet18 baseline I get 0.95 R-Squared score for treecover (paper 0.91) when doing proper normalization. Since we cannot replicate their results directly anyway as Caleb pointed out, I would suggest to just use the computed normalization statistics on this dataset, and I think adding support for target value normalization would be good as well. |
Hi, just saw this. Chiming in on a few things and please let me know if I can be helpful with anything else @nilsleh!
|
@nilsleh should we try to sneak this into v0.5.2? |
This PR adds the Resize Augmentation from the Paper code base found here: https://github.com/Global-Policy-Lab/mosaiks-paper/blob/master/code/analysis/1_feature_extraction/2_featurize_models_deep_pretrained.py