Regarding Experimental Results #8

yangtle · 2023-12-12T04:23:49Z

Dear author,

Hello. I conducted multiple experiments on the SHA dataset following the provided code and parameter settings. Each experiment consisted of 1500 epochs with val=5. However, I observed fluctuations in the results, with MAE around 52 and MSE around 85. I would like to inquire whether setting val to 1 is necessary to achieve the results mentioned in the paper or if there are additional settings required.

Best regards.

cxliu0 · 2023-12-13T10:59:17Z

Hi, we notice that the performance of the trained model may fluctuate when torch version > 1.7.

To improve reproducibility, you may add torch.backends.cudnn.deterministic = True in main.py #L98.
For the SHA dataset, we can reproduce the results based on this repository, and we did not change any parameters.

Could you provide your environment? We could check the results and get back to you later.

yangtle · 2023-12-14T06:24:57Z

Hello, thank you for your reply! My environment version is python=3.8.13, pytorch=1.12.1, CUDA=11.7, and the GPU is NVIDIA A40. I have already set torch.backends.cudnn.deterministic = True and am currently conducting experiments.

In addition, I would like to consult with you about the issue of fluctuation in data results, primarily manifested in ① MAE fluctuating between approximately 2-3, which could be the most critical; ② at the same time, the epoch stage for achieving the optimal result is not consistent, sometimes around 200 with generally lower validation MAE, and other times around 1000.

To be honest, I observe a similar situation in my own model. What is your perspective on this? Is this normal for the crowd counting domain, where we focus solely on the MAE evaluation metric? Thank you.

cxliu0 · 2023-12-14T11:38:21Z

Many factors could lead to performance fluctuation, e.g., torch version and image processing. Even the same model may produce different outputs under different torch versions.

Actually, performance fluctuation also occurs in previous works. The optimal epoch may not be necessarily the same under different settings. That is the reason I suggest adding torch.backends.cudnn.deterministic = True before training. This can ensure that the optimal epoch is the same every time you train the model.

Regarding the evaluation metric, MAE is more sensitive compared with Average Precision (AP) used in object detection. Based on previous works, perhaps fluctuation is common in crowd counting.

cxliu0 · 2023-12-15T11:41:14Z

[Update] We have set up the environment (python=3.8, pytorch=1.12) and trained the model. The performance seems fine and the optimal epoch is 765. By the way, we have added torch.backends.cudnn.deterministic = True before training, and the results are reproducible.

Here is a snippet of the training log:

[ep 761][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.2394831186613521, "train_loss_ce_sp": 0.14828089060815605, "train_loss_points_sp": 0.001931241263534773, "train_loss_ce_ds": 0.08730715946168513, "train_loss_points_ds": 0.0015310150873218034, "train_loss_split": 0.000432813169485033, "train_loss_ce_sp_unscaled": 0.14828089060815605, "train_loss_points_sp_unscaled": 0.0003862482530215906, "train_loss_ce_ds_unscaled": 0.08730715946168513, "train_loss_points_ds_unscaled": 0.00030620301510459066, "train_loss_split_unscaled": 0.004328131675720215, "epoch": 761, "n_parameters": 20909385}

[ep 762][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.254863301040353, "train_loss_ce_sp": 0.15025504396573916, "train_loss_points_sp": 0.001883206473713791, "train_loss_ce_ds": 0.1012990406236133, "train_loss_points_ds": 0.0014172781107796204, "train_loss_split": 8.732402739311154e-06, "train_loss_ce_sp_unscaled": 0.15025504396573916, "train_loss_points_sp_unscaled": 0.0003766412941134862, "train_loss_ce_ds_unscaled": 0.1012990406236133, "train_loss_points_ds_unscaled": 0.0002834556232571501, "train_loss_split_unscaled": 8.732402646863782e-05, "epoch": 762, "n_parameters": 20909385}

[ep 763][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.24386153108364828, "train_loss_ce_sp": 0.14579840346768097, "train_loss_points_sp": 0.0017963021874075403, "train_loss_ce_ds": 0.09482591055534981, "train_loss_points_ds": 0.0014338326621548953, "train_loss_split": 7.0847370220060325e-06, "train_loss_ce_sp_unscaled": 0.14579840346768097, "train_loss_points_sp_unscaled": 0.00035926043700955406, "train_loss_ce_ds_unscaled": 0.09482591055534981, "train_loss_points_ds_unscaled": 0.000286766530464504, "train_loss_split_unscaled": 7.084736952910552e-05, "epoch": 763, "n_parameters": 20909385}

[ep 764][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.24069207403305415, "train_loss_ce_sp": 0.1397619226292984, "train_loss_points_sp": 0.0018185121667999272, "train_loss_ce_ds": 0.09767922215365074, "train_loss_points_ds": 0.0014291233162873902, "train_loss_split": 3.2890487788692454e-06, "train_loss_ce_sp_unscaled": 0.1397619226292984, "train_loss_points_sp_unscaled": 0.0003637024319441234, "train_loss_ce_ds_unscaled": 0.09767922215365074, "train_loss_points_ds_unscaled": 0.00028582466404406806, "train_loss_split_unscaled": 3.289048736159866e-05, "epoch": 764, "n_parameters": 20909385}

[ep 765][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.25245368943826574, "train_loss_ce_sp": 0.15302851373279416, "train_loss_points_sp": 0.0018379672814311612, "train_loss_ce_ds": 0.09610540419816971, "train_loss_points_ds": 0.0014743787264828948, "train_loss_split": 7.426094503818751e-06, "train_loss_ce_sp_unscaled": 0.15302851373279416, "train_loss_points_sp_unscaled": 0.00036759345833136623, "train_loss_ce_ds_unscaled": 0.09610540419816971, "train_loss_points_ds_unscaled": 0.00029487574686975896, "train_loss_split_unscaled": 7.42609436447556e-05, "epoch": 765, "n_parameters": 20909385}

epoch:765, mae:49.07692307692308, mse:76.87309365098122, time:12.311256408691406, 

best mae:49.07692307692308, best epoch: 765

SherlockHolmes221 · 2023-12-28T01:50:56Z

Hi, we notice that the performance of the trained model may fluctuate when torch version > 1.7.

To improve reproducibility, you may add torch.backends.cudnn.deterministic = True in main.py #L98.

For the SHA dataset, we can reproduce the results based on this repository, and we did not change any parameters.

Could you provide your environment? We could check the results and get back to you later.

so which pytorch version is recommend？

cxliu0 · 2023-12-28T07:21:51Z

We have tested pytorch 1.7 and pytorch 1.12. Both versions can achieve the reported results (or slightly better than the reported results). You may also try to use other pytorch versions if you like.

little-seasalt · 2023-12-28T08:37:01Z

We have tested pytorch 1.7 and pytorch 1.12. Both versions can achieve the reported results (or slightly better than the reported results). You may also try to use other pytorch versions if you like.

I retrained SHA in the environment of pytorch=1.12.1, but the final MAE was 53.36 and the MSE was 88.26, which are quite different from the results of the paper. What factors do you think may cause this situation? I have added torch.backends.cudnn.deterministic = True in main.py.

yangtle · 2023-12-28T09:22:53Z

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.

In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

little-seasalt · 2023-12-28T09:33:31Z

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.

In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

Hello,i would like to know, which epoch does MAE reach around 50? Also,have you modified the relevant parameters?

yangtle · 2023-12-28T09:37:55Z

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.
In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

Hello,i would like to know, which epoch does MAE reach around 50? Also,have you modified the relevant parameters?

[ep 1151][lr 0.0001000][34.14s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.222655377476602, "train_loss_ce_sp": 0.1416090032136118, "train_loss_points_sp": 0.0019127310322899673, "train_loss_ce_ds": 0.07753163435169168, "train_loss_points_ds": 0.001594329293741769, "train_loss_split": 7.679978100103088e-06, "train_loss_ce_sp_unscaled": 0.1416090032136118, "train_loss_points_sp_unscaled": 0.00038254620567140346, "train_loss_ce_ds_unscaled": 0.07753163435169168, "train_loss_points_ds_unscaled": 0.0003188658581190818, "train_loss_split_unscaled": 7.679977932491818e-05, "epoch": 1151, "n_parameters": 20909385}

[ep 1152][lr 0.0001000][33.61s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.20860815420746803, "train_loss_ce_sp": 0.12717940456963875, "train_loss_points_sp": 0.0018302943821795084, "train_loss_ce_ds": 0.07816224245706925, "train_loss_points_ds": 0.0014164968412621198, "train_loss_split": 1.971737731348394e-05, "train_loss_ce_sp_unscaled": 0.12717940456963875, "train_loss_points_sp_unscaled": 0.00036605887627858365, "train_loss_ce_ds_unscaled": 0.07816224245706925, "train_loss_points_ds_unscaled": 0.00028329936833108296, "train_loss_split_unscaled": 0.00019717377585333747, "epoch": 1152, "n_parameters": 20909385}

[ep 1153][lr 0.0001000][34.25s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.22202433383948095, "train_loss_ce_sp": 0.13644243716388135, "train_loss_points_sp": 0.0018351712213778818, "train_loss_ce_ds": 0.08227713154377164, "train_loss_points_ds": 0.0014612032305071684, "train_loss_split": 8.38927346618417e-06, "train_loss_ce_sp_unscaled": 0.13644243716388135, "train_loss_points_sp_unscaled": 0.00036703424317435036, "train_loss_ce_ds_unscaled": 0.08227713154377164, "train_loss_points_ds_unscaled": 0.00029224064476423065, "train_loss_split_unscaled": 8.389273205318966e-05, "epoch": 1153, "n_parameters": 20909385}

[ep 1154][lr 0.0001000][34.32s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.23919424092447436, "train_loss_ce_sp": 0.1506406638670612, "train_loss_points_sp": 0.0018515667714480613, "train_loss_ce_ds": 0.08527221067531689, "train_loss_points_ds": 0.001426672808372894, "train_loss_split": 3.12714969806092e-06, "train_loss_ce_sp_unscaled": 0.1506406638670612, "train_loss_points_sp_unscaled": 0.0003703133568067003, "train_loss_ce_ds_unscaled": 0.08527221067531689, "train_loss_points_ds_unscaled": 0.00028533456143860176, "train_loss_split_unscaled": 3.127149633459143e-05, "epoch": 1154, "n_parameters": 20909385}

[ep 1155][lr 0.0001000][34.66s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.2269019383836437, "train_loss_ce_sp": 0.13837270579628042, "train_loss_points_sp": 0.001710777438396739, "train_loss_ce_ds": 0.0853276233616713, "train_loss_points_ds": 0.0014882287312601064, "train_loss_split": 2.6029510128812553e-06, "train_loss_ce_sp_unscaled": 0.13837270579628042, "train_loss_points_sp_unscaled": 0.0003421554865781218, "train_loss_ce_ds_unscaled": 0.0853276233616713, "train_loss_points_ds_unscaled": 0.00029764574916240434, "train_loss_split_unscaled": 2.6029509467047614e-05, "epoch": 1155, "n_parameters": 20909385}

epoch:1155, mae:50.862637362637365, mse:84.92643875731514, time39.40563941001892,

best mae:50.862637362637365, best epoch: 1155
I have not modified the model, and setting torch.backends.cudnn.deterministic = True ensures consistent results every time.

little-seasalt · 2023-12-28T09:40:23Z

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.
In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

Hello,i would like to know, which epoch does MAE reach around 50? Also,have you modified the relevant parameters?

[ep 1151][lr 0.0001000][34.14s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.222655377476602, "train_loss_ce_sp": 0.1416090032136118, "train_loss_points_sp": 0.0019127310322899673, "train_loss_ce_ds": 0.07753163435169168, "train_loss_points_ds": 0.001594329293741769, "train_loss_split": 7.679978100103088e-06, "train_loss_ce_sp_unscaled": 0.1416090032136118, "train_loss_points_sp_unscaled": 0.00038254620567140346, "train_loss_ce_ds_unscaled": 0.07753163435169168, "train_loss_points_ds_unscaled": 0.0003188658581190818, "train_loss_split_unscaled": 7.679977932491818e-05, "epoch": 1151, "n_parameters": 20909385}

[ep 1152][lr 0.0001000][33.61s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.20860815420746803, "train_loss_ce_sp": 0.12717940456963875, "train_loss_points_sp": 0.0018302943821795084, "train_loss_ce_ds": 0.07816224245706925, "train_loss_points_ds": 0.0014164968412621198, "train_loss_split": 1.971737731348394e-05, "train_loss_ce_sp_unscaled": 0.12717940456963875, "train_loss_points_sp_unscaled": 0.00036605887627858365, "train_loss_ce_ds_unscaled": 0.07816224245706925, "train_loss_points_ds_unscaled": 0.00028329936833108296, "train_loss_split_unscaled": 0.00019717377585333747, "epoch": 1152, "n_parameters": 20909385}

[ep 1153][lr 0.0001000][34.25s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.22202433383948095, "train_loss_ce_sp": 0.13644243716388135, "train_loss_points_sp": 0.0018351712213778818, "train_loss_ce_ds": 0.08227713154377164, "train_loss_points_ds": 0.0014612032305071684, "train_loss_split": 8.38927346618417e-06, "train_loss_ce_sp_unscaled": 0.13644243716388135, "train_loss_points_sp_unscaled": 0.00036703424317435036, "train_loss_ce_ds_unscaled": 0.08227713154377164, "train_loss_points_ds_unscaled": 0.00029224064476423065, "train_loss_split_unscaled": 8.389273205318966e-05, "epoch": 1153, "n_parameters": 20909385}

[ep 1154][lr 0.0001000][34.32s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.23919424092447436, "train_loss_ce_sp": 0.1506406638670612, "train_loss_points_sp": 0.0018515667714480613, "train_loss_ce_ds": 0.08527221067531689, "train_loss_points_ds": 0.001426672808372894, "train_loss_split": 3.12714969806092e-06, "train_loss_ce_sp_unscaled": 0.1506406638670612, "train_loss_points_sp_unscaled": 0.0003703133568067003, "train_loss_ce_ds_unscaled": 0.08527221067531689, "train_loss_points_ds_unscaled": 0.00028533456143860176, "train_loss_split_unscaled": 3.127149633459143e-05, "epoch": 1154, "n_parameters": 20909385}

[ep 1155][lr 0.0001000][34.66s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.2269019383836437, "train_loss_ce_sp": 0.13837270579628042, "train_loss_points_sp": 0.001710777438396739, "train_loss_ce_ds": 0.0853276233616713, "train_loss_points_ds": 0.0014882287312601064, "train_loss_split": 2.6029510128812553e-06, "train_loss_ce_sp_unscaled": 0.13837270579628042, "train_loss_points_sp_unscaled": 0.0003421554865781218, "train_loss_ce_ds_unscaled": 0.0853276233616713, "train_loss_points_ds_unscaled": 0.00029764574916240434, "train_loss_split_unscaled": 2.6029509467047614e-05, "epoch": 1155, "n_parameters": 20909385}

epoch:1155, mae:50.862637362637365, mse:84.92643875731514, time39.40563941001892,

best mae:50.862637362637365, best epoch: 1155 I have not modified the model, and setting torch.backends.cudnn.deterministic = True ensures consistent results every time.

Thank you for your answer.

cxliu0 · 2023-12-31T04:48:31Z

We have tested pytorch 1.7 and pytorch 1.12. Both versions can achieve the reported results (or slightly better than the reported results). You may also try to use other pytorch versions if you like.

I retrained SHA in the environment of pytorch=1.12.1, but the final MAE was 53.36 and the MSE was 88.26, which are quite different from the results of the paper. What factors do you think may cause this situation? I have added torch.backends.cudnn.deterministic = True in main.py.

Perhaps different environments still have an impact on the performance, which can affect data processing and model optimization. You may try to tune the scale augmentation parameters to see how the performance changes. We will also try to test the code on different machines to see what happens.

In addition, the quality of SHA may also contribute to performance fluctuation. Compared with other datasets, SHA is a relatively small dataset.

cxliu0 · 2023-12-31T04:52:27Z

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.

In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

SHA and SHB can share the same training parameters. Regarding other datasets, please refer to this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding Experimental Results #8

Regarding Experimental Results #8

yangtle commented Dec 12, 2023

cxliu0 commented Dec 13, 2023

yangtle commented Dec 14, 2023

cxliu0 commented Dec 14, 2023

cxliu0 commented Dec 15, 2023

SherlockHolmes221 commented Dec 28, 2023

cxliu0 commented Dec 28, 2023

little-seasalt commented Dec 28, 2023

yangtle commented Dec 28, 2023

little-seasalt commented Dec 28, 2023

yangtle commented Dec 28, 2023

little-seasalt commented Dec 28, 2023

cxliu0 commented Dec 31, 2023

cxliu0 commented Dec 31, 2023

Regarding Experimental Results #8

Regarding Experimental Results #8

Comments

yangtle commented Dec 12, 2023

cxliu0 commented Dec 13, 2023

yangtle commented Dec 14, 2023

cxliu0 commented Dec 14, 2023

cxliu0 commented Dec 15, 2023

SherlockHolmes221 commented Dec 28, 2023

cxliu0 commented Dec 28, 2023

little-seasalt commented Dec 28, 2023

yangtle commented Dec 28, 2023

little-seasalt commented Dec 28, 2023

yangtle commented Dec 28, 2023

little-seasalt commented Dec 28, 2023

cxliu0 commented Dec 31, 2023

cxliu0 commented Dec 31, 2023