Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding Experimental Results #8

Open
yangtle opened this issue Dec 12, 2023 · 13 comments
Open

Regarding Experimental Results #8

yangtle opened this issue Dec 12, 2023 · 13 comments

Comments

@yangtle
Copy link

yangtle commented Dec 12, 2023

Dear author,

Hello. I conducted multiple experiments on the SHA dataset following the provided code and parameter settings. Each experiment consisted of 1500 epochs with val=5. However, I observed fluctuations in the results, with MAE around 52 and MSE around 85. I would like to inquire whether setting val to 1 is necessary to achieve the results mentioned in the paper or if there are additional settings required.

Best regards.

@cxliu0
Copy link
Owner

cxliu0 commented Dec 13, 2023

Hi, we notice that the performance of the trained model may fluctuate when torch version > 1.7.

  1. To improve reproducibility, you may add torch.backends.cudnn.deterministic = True in main.py #L98.
  2. For the SHA dataset, we can reproduce the results based on this repository, and we did not change any parameters.

Could you provide your environment? We could check the results and get back to you later.

@yangtle
Copy link
Author

yangtle commented Dec 14, 2023

Hello, thank you for your reply! My environment version is python=3.8.13, pytorch=1.12.1, CUDA=11.7, and the GPU is NVIDIA A40. I have already set torch.backends.cudnn.deterministic = True and am currently conducting experiments.

In addition, I would like to consult with you about the issue of fluctuation in data results, primarily manifested in ① MAE fluctuating between approximately 2-3, which could be the most critical; ② at the same time, the epoch stage for achieving the optimal result is not consistent, sometimes around 200 with generally lower validation MAE, and other times around 1000.

To be honest, I observe a similar situation in my own model. What is your perspective on this? Is this normal for the crowd counting domain, where we focus solely on the MAE evaluation metric? Thank you.

@cxliu0
Copy link
Owner

cxliu0 commented Dec 14, 2023

Many factors could lead to performance fluctuation, e.g., torch version and image processing. Even the same model may produce different outputs under different torch versions.

Actually, performance fluctuation also occurs in previous works. The optimal epoch may not be necessarily the same under different settings. That is the reason I suggest adding torch.backends.cudnn.deterministic = True before training. This can ensure that the optimal epoch is the same every time you train the model.

Regarding the evaluation metric, MAE is more sensitive compared with Average Precision (AP) used in object detection. Based on previous works, perhaps fluctuation is common in crowd counting.

@cxliu0
Copy link
Owner

cxliu0 commented Dec 15, 2023

[Update] We have set up the environment (python=3.8, pytorch=1.12) and trained the model. The performance seems fine and the optimal epoch is 765. By the way, we have added torch.backends.cudnn.deterministic = True before training, and the results are reproducible.

Here is a snippet of the training log:

[ep 761][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.2394831186613521, "train_loss_ce_sp": 0.14828089060815605, "train_loss_points_sp": 0.001931241263534773, "train_loss_ce_ds": 0.08730715946168513, "train_loss_points_ds": 0.0015310150873218034, "train_loss_split": 0.000432813169485033, "train_loss_ce_sp_unscaled": 0.14828089060815605, "train_loss_points_sp_unscaled": 0.0003862482530215906, "train_loss_ce_ds_unscaled": 0.08730715946168513, "train_loss_points_ds_unscaled": 0.00030620301510459066, "train_loss_split_unscaled": 0.004328131675720215, "epoch": 761, "n_parameters": 20909385}

[ep 762][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.254863301040353, "train_loss_ce_sp": 0.15025504396573916, "train_loss_points_sp": 0.001883206473713791, "train_loss_ce_ds": 0.1012990406236133, "train_loss_points_ds": 0.0014172781107796204, "train_loss_split": 8.732402739311154e-06, "train_loss_ce_sp_unscaled": 0.15025504396573916, "train_loss_points_sp_unscaled": 0.0003766412941134862, "train_loss_ce_ds_unscaled": 0.1012990406236133, "train_loss_points_ds_unscaled": 0.0002834556232571501, "train_loss_split_unscaled": 8.732402646863782e-05, "epoch": 762, "n_parameters": 20909385}

[ep 763][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.24386153108364828, "train_loss_ce_sp": 0.14579840346768097, "train_loss_points_sp": 0.0017963021874075403, "train_loss_ce_ds": 0.09482591055534981, "train_loss_points_ds": 0.0014338326621548953, "train_loss_split": 7.0847370220060325e-06, "train_loss_ce_sp_unscaled": 0.14579840346768097, "train_loss_points_sp_unscaled": 0.00035926043700955406, "train_loss_ce_ds_unscaled": 0.09482591055534981, "train_loss_points_ds_unscaled": 0.000286766530464504, "train_loss_split_unscaled": 7.084736952910552e-05, "epoch": 763, "n_parameters": 20909385}

[ep 764][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.24069207403305415, "train_loss_ce_sp": 0.1397619226292984, "train_loss_points_sp": 0.0018185121667999272, "train_loss_ce_ds": 0.09767922215365074, "train_loss_points_ds": 0.0014291233162873902, "train_loss_split": 3.2890487788692454e-06, "train_loss_ce_sp_unscaled": 0.1397619226292984, "train_loss_points_sp_unscaled": 0.0003637024319441234, "train_loss_ce_ds_unscaled": 0.09767922215365074, "train_loss_points_ds_unscaled": 0.00028582466404406806, "train_loss_split_unscaled": 3.289048736159866e-05, "epoch": 764, "n_parameters": 20909385}

[ep 765][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.25245368943826574, "train_loss_ce_sp": 0.15302851373279416, "train_loss_points_sp": 0.0018379672814311612, "train_loss_ce_ds": 0.09610540419816971, "train_loss_points_ds": 0.0014743787264828948, "train_loss_split": 7.426094503818751e-06, "train_loss_ce_sp_unscaled": 0.15302851373279416, "train_loss_points_sp_unscaled": 0.00036759345833136623, "train_loss_ce_ds_unscaled": 0.09610540419816971, "train_loss_points_ds_unscaled": 0.00029487574686975896, "train_loss_split_unscaled": 7.42609436447556e-05, "epoch": 765, "n_parameters": 20909385}

epoch:765, mae:49.07692307692308, mse:76.87309365098122, time:12.311256408691406, 

best mae:49.07692307692308, best epoch: 765

@SherlockHolmes221
Copy link

Hi, we notice that the performance of the trained model may fluctuate when torch version > 1.7.

  1. To improve reproducibility, you may add torch.backends.cudnn.deterministic = True in main.py #L98.
  2. For the SHA dataset, we can reproduce the results based on this repository, and we did not change any parameters.

Could you provide your environment? We could check the results and get back to you later.

so which pytorch version is recommend?

@cxliu0
Copy link
Owner

cxliu0 commented Dec 28, 2023

We have tested pytorch 1.7 and pytorch 1.12. Both versions can achieve the reported results (or slightly better than the reported results). You may also try to use other pytorch versions if you like.

@little-seasalt
Copy link

We have tested pytorch 1.7 and pytorch 1.12. Both versions can achieve the reported results (or slightly better than the reported results). You may also try to use other pytorch versions if you like.

I retrained SHA in the environment of pytorch=1.12.1, but the final MAE was 53.36 and the MSE was 88.26, which are quite different from the results of the paper. What factors do you think may cause this situation? I have added torch.backends.cudnn.deterministic = True in main.py.

@yangtle
Copy link
Author

yangtle commented Dec 28, 2023

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.

In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

@little-seasalt
Copy link

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.

In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

Hello,i would like to know, which epoch does MAE reach around 50? Also,have you modified the relevant parameters?

@yangtle
Copy link
Author

yangtle commented Dec 28, 2023

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.
In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

Hello,i would like to know, which epoch does MAE reach around 50? Also,have you modified the relevant parameters?

[ep 1151][lr 0.0001000][34.14s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.222655377476602, "train_loss_ce_sp": 0.1416090032136118, "train_loss_points_sp": 0.0019127310322899673, "train_loss_ce_ds": 0.07753163435169168, "train_loss_points_ds": 0.001594329293741769, "train_loss_split": 7.679978100103088e-06, "train_loss_ce_sp_unscaled": 0.1416090032136118, "train_loss_points_sp_unscaled": 0.00038254620567140346, "train_loss_ce_ds_unscaled": 0.07753163435169168, "train_loss_points_ds_unscaled": 0.0003188658581190818, "train_loss_split_unscaled": 7.679977932491818e-05, "epoch": 1151, "n_parameters": 20909385}

[ep 1152][lr 0.0001000][33.61s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.20860815420746803, "train_loss_ce_sp": 0.12717940456963875, "train_loss_points_sp": 0.0018302943821795084, "train_loss_ce_ds": 0.07816224245706925, "train_loss_points_ds": 0.0014164968412621198, "train_loss_split": 1.971737731348394e-05, "train_loss_ce_sp_unscaled": 0.12717940456963875, "train_loss_points_sp_unscaled": 0.00036605887627858365, "train_loss_ce_ds_unscaled": 0.07816224245706925, "train_loss_points_ds_unscaled": 0.00028329936833108296, "train_loss_split_unscaled": 0.00019717377585333747, "epoch": 1152, "n_parameters": 20909385}

[ep 1153][lr 0.0001000][34.25s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.22202433383948095, "train_loss_ce_sp": 0.13644243716388135, "train_loss_points_sp": 0.0018351712213778818, "train_loss_ce_ds": 0.08227713154377164, "train_loss_points_ds": 0.0014612032305071684, "train_loss_split": 8.38927346618417e-06, "train_loss_ce_sp_unscaled": 0.13644243716388135, "train_loss_points_sp_unscaled": 0.00036703424317435036, "train_loss_ce_ds_unscaled": 0.08227713154377164, "train_loss_points_ds_unscaled": 0.00029224064476423065, "train_loss_split_unscaled": 8.389273205318966e-05, "epoch": 1153, "n_parameters": 20909385}

[ep 1154][lr 0.0001000][34.32s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.23919424092447436, "train_loss_ce_sp": 0.1506406638670612, "train_loss_points_sp": 0.0018515667714480613, "train_loss_ce_ds": 0.08527221067531689, "train_loss_points_ds": 0.001426672808372894, "train_loss_split": 3.12714969806092e-06, "train_loss_ce_sp_unscaled": 0.1506406638670612, "train_loss_points_sp_unscaled": 0.0003703133568067003, "train_loss_ce_ds_unscaled": 0.08527221067531689, "train_loss_points_ds_unscaled": 0.00028533456143860176, "train_loss_split_unscaled": 3.127149633459143e-05, "epoch": 1154, "n_parameters": 20909385}

[ep 1155][lr 0.0001000][34.66s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.2269019383836437, "train_loss_ce_sp": 0.13837270579628042, "train_loss_points_sp": 0.001710777438396739, "train_loss_ce_ds": 0.0853276233616713, "train_loss_points_ds": 0.0014882287312601064, "train_loss_split": 2.6029510128812553e-06, "train_loss_ce_sp_unscaled": 0.13837270579628042, "train_loss_points_sp_unscaled": 0.0003421554865781218, "train_loss_ce_ds_unscaled": 0.0853276233616713, "train_loss_points_ds_unscaled": 0.00029764574916240434, "train_loss_split_unscaled": 2.6029509467047614e-05, "epoch": 1155, "n_parameters": 20909385}

epoch:1155, mae:50.862637362637365, mse:84.92643875731514, time39.40563941001892,

best mae:50.862637362637365, best epoch: 1155
I have not modified the model, and setting torch.backends.cudnn.deterministic = True ensures consistent results every time.

@little-seasalt
Copy link

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.
In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

Hello,i would like to know, which epoch does MAE reach around 50? Also,have you modified the relevant parameters?

[ep 1151][lr 0.0001000][34.14s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.222655377476602, "train_loss_ce_sp": 0.1416090032136118, "train_loss_points_sp": 0.0019127310322899673, "train_loss_ce_ds": 0.07753163435169168, "train_loss_points_ds": 0.001594329293741769, "train_loss_split": 7.679978100103088e-06, "train_loss_ce_sp_unscaled": 0.1416090032136118, "train_loss_points_sp_unscaled": 0.00038254620567140346, "train_loss_ce_ds_unscaled": 0.07753163435169168, "train_loss_points_ds_unscaled": 0.0003188658581190818, "train_loss_split_unscaled": 7.679977932491818e-05, "epoch": 1151, "n_parameters": 20909385}

[ep 1152][lr 0.0001000][33.61s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.20860815420746803, "train_loss_ce_sp": 0.12717940456963875, "train_loss_points_sp": 0.0018302943821795084, "train_loss_ce_ds": 0.07816224245706925, "train_loss_points_ds": 0.0014164968412621198, "train_loss_split": 1.971737731348394e-05, "train_loss_ce_sp_unscaled": 0.12717940456963875, "train_loss_points_sp_unscaled": 0.00036605887627858365, "train_loss_ce_ds_unscaled": 0.07816224245706925, "train_loss_points_ds_unscaled": 0.00028329936833108296, "train_loss_split_unscaled": 0.00019717377585333747, "epoch": 1152, "n_parameters": 20909385}

[ep 1153][lr 0.0001000][34.25s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.22202433383948095, "train_loss_ce_sp": 0.13644243716388135, "train_loss_points_sp": 0.0018351712213778818, "train_loss_ce_ds": 0.08227713154377164, "train_loss_points_ds": 0.0014612032305071684, "train_loss_split": 8.38927346618417e-06, "train_loss_ce_sp_unscaled": 0.13644243716388135, "train_loss_points_sp_unscaled": 0.00036703424317435036, "train_loss_ce_ds_unscaled": 0.08227713154377164, "train_loss_points_ds_unscaled": 0.00029224064476423065, "train_loss_split_unscaled": 8.389273205318966e-05, "epoch": 1153, "n_parameters": 20909385}

[ep 1154][lr 0.0001000][34.32s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.23919424092447436, "train_loss_ce_sp": 0.1506406638670612, "train_loss_points_sp": 0.0018515667714480613, "train_loss_ce_ds": 0.08527221067531689, "train_loss_points_ds": 0.001426672808372894, "train_loss_split": 3.12714969806092e-06, "train_loss_ce_sp_unscaled": 0.1506406638670612, "train_loss_points_sp_unscaled": 0.0003703133568067003, "train_loss_ce_ds_unscaled": 0.08527221067531689, "train_loss_points_ds_unscaled": 0.00028533456143860176, "train_loss_split_unscaled": 3.127149633459143e-05, "epoch": 1154, "n_parameters": 20909385}

[ep 1155][lr 0.0001000][34.66s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.2269019383836437, "train_loss_ce_sp": 0.13837270579628042, "train_loss_points_sp": 0.001710777438396739, "train_loss_ce_ds": 0.0853276233616713, "train_loss_points_ds": 0.0014882287312601064, "train_loss_split": 2.6029510128812553e-06, "train_loss_ce_sp_unscaled": 0.13837270579628042, "train_loss_points_sp_unscaled": 0.0003421554865781218, "train_loss_ce_ds_unscaled": 0.0853276233616713, "train_loss_points_ds_unscaled": 0.00029764574916240434, "train_loss_split_unscaled": 2.6029509467047614e-05, "epoch": 1155, "n_parameters": 20909385}

epoch:1155, mae:50.862637362637365, mse:84.92643875731514, time39.40563941001892,

best mae:50.862637362637365, best epoch: 1155 I have not modified the model, and setting torch.backends.cudnn.deterministic = True ensures consistent results every time.

Thank you for your answer.

@cxliu0
Copy link
Owner

cxliu0 commented Dec 31, 2023

We have tested pytorch 1.7 and pytorch 1.12. Both versions can achieve the reported results (or slightly better than the reported results). You may also try to use other pytorch versions if you like.

I retrained SHA in the environment of pytorch=1.12.1, but the final MAE was 53.36 and the MSE was 88.26, which are quite different from the results of the paper. What factors do you think may cause this situation? I have added torch.backends.cudnn.deterministic = True in main.py.

Perhaps different environments still have an impact on the performance, which can affect data processing and model optimization. You may try to tune the scale augmentation parameters to see how the performance changes. We will also try to test the code on different machines to see what happens.

In addition, the quality of SHA may also contribute to performance fluctuation. Compared with other datasets, SHA is a relatively small dataset.

@cxliu0
Copy link
Owner

cxliu0 commented Dec 31, 2023

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.

In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

SHA and SHB can share the same training parameters. Regarding other datasets, please refer to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants