Overtrain fix #431

rappc87 · 2024-05-15T08:26:51Z

now overtrain works as it should. after training for lowest_value+overtrain_threshold, if there is no decrease in lowest_value, it overtrains and the train stops.

rappc87 · 2024-05-15T12:11:22Z

added saving best_epoch every time lowest_value changes

aitronssesin · 2024-05-18T11:01:24Z

I think this doesn't work, I started training with 10 overtraining threshold and it started saving a lot of models randomly but then it removed all saved models and the training was so slow

C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=1 | step=40 | time=12:46:47 | training_speed=0:00:27 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_1e_40s_best_epoch.pth' (epoch 1 and step 40) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=2 | step=80 | time=12:47:14 | training_speed=0:00:22 | lowest_value=27.87265396118164 (epoch 2 and step 69) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_2e_80s_best_epoch.pth' (epoch 2 and step 80) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=3 | step=120 | time=12:47:41 | training_speed=0:00:21 | lowest_value=25.2319278717041 (epoch 3 and step 116) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_3e_120s_best_epoch.pth' (epoch 3 and step 120) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=4 | step=160 | time=12:48:08 | training_speed=0:00:21 | lowest_value=18.749269485473633 (epoch 4 and step 150) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_4e_160s_best_epoch.pth' (epoch 4 and step 160) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=5 | step=200 | time=12:48:33 | training_speed=0:00:21 | lowest_value=16.182971954345703 (epoch 5 and step 175) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_5e_200s_best_epoch.pth' (epoch 5 and step 200) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=6 | step=240 | time=12:48:57 | training_speed=0:00:20 | lowest_value=16.182971954345703 (epoch 5 and step 175) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=7 | step=280 | time=12:49:17 | training_speed=0:00:20 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_7e_280s_best_epoch.pth' (epoch 7 and step 280) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=8 | step=320 | time=12:49:42 | training_speed=0:00:20 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=9 | step=360 | time=12:50:03 | training_speed=0:00:21 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 8 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1\G_400.pth' (epoch 10) Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1\D_400.pth' (epoch 10) Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_10e_400s.pth' (epoch 10 and step 400) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=10 | step=400 | time=12:50:42 | training_speed=0:00:38 | lowest_value=12.121013641357422 (epoch 10 and step 377) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_10e_400s_best_epoch.pth' (epoch 10 and step 400) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=11 | step=440 | time=12:51:07 | training_speed=0:00:18 | lowest_value=11.236621856689453 (epoch 11 and step 400) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_11e_440s_best_epoch.pth' (epoch 11 and step 440) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=12 | step=480 | time=12:51:31 | training_speed=0:00:20 | lowest_value=10.71220874786377 (epoch 12 and step 454) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_12e_480s_best_epoch.pth' (epoch 12 and step 480) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=13 | step=520 | time=12:51:56 | training_speed=0:00:20 | lowest_value=10.71220874786377 (epoch 12 and step 454) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=14 | step=560 | time=12:52:16 | training_speed=0:00:20 | lowest_value=9.570732116699219 (epoch 14 and step 536) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_14e_560s_best_epoch.pth' (epoch 14 and step 560) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=15 | step=600 | time=12:52:42 | training_speed=0:00:21 | lowest_value=8.836076736450195 (epoch 15 and step 560) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_15e_600s_best_epoch.pth' (epoch 15 and step 600)

Saved files (the first epoch is the sync graph)

rappc87 · 2024-05-19T08:06:07Z

I think this doesn't work, I started training with 10 overtraining threshold and it started saving a lot of models randomly but then it removed all saved models and the training was so slow

C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=1 | step=40 | time=12:46:47 | training_speed=0:00:27 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_1e_40s_best_epoch.pth' (epoch 1 and step 40) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=2 | step=80 | time=12:47:14 | training_speed=0:00:22 | lowest_value=27.87265396118164 (epoch 2 and step 69) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_2e_80s_best_epoch.pth' (epoch 2 and step 80) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=3 | step=120 | time=12:47:41 | training_speed=0:00:21 | lowest_value=25.2319278717041 (epoch 3 and step 116) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_3e_120s_best_epoch.pth' (epoch 3 and step 120) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=4 | step=160 | time=12:48:08 | training_speed=0:00:21 | lowest_value=18.749269485473633 (epoch 4 and step 150) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_4e_160s_best_epoch.pth' (epoch 4 and step 160) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=5 | step=200 | time=12:48:33 | training_speed=0:00:21 | lowest_value=16.182971954345703 (epoch 5 and step 175) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_5e_200s_best_epoch.pth' (epoch 5 and step 200) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=6 | step=240 | time=12:48:57 | training_speed=0:00:20 | lowest_value=16.182971954345703 (epoch 5 and step 175) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=7 | step=280 | time=12:49:17 | training_speed=0:00:20 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_7e_280s_best_epoch.pth' (epoch 7 and step 280) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=8 | step=320 | time=12:49:42 | training_speed=0:00:20 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=9 | step=360 | time=12:50:03 | training_speed=0:00:21 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 8 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1\G_400.pth' (epoch 10) Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1\D_400.pth' (epoch 10) Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_10e_400s.pth' (epoch 10 and step 400) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=10 | step=400 | time=12:50:42 | training_speed=0:00:38 | lowest_value=12.121013641357422 (epoch 10 and step 377) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_10e_400s_best_epoch.pth' (epoch 10 and step 400) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=11 | step=440 | time=12:51:07 | training_speed=0:00:18 | lowest_value=11.236621856689453 (epoch 11 and step 400) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_11e_440s_best_epoch.pth' (epoch 11 and step 440) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=12 | step=480 | time=12:51:31 | training_speed=0:00:20 | lowest_value=10.71220874786377 (epoch 12 and step 454) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_12e_480s_best_epoch.pth' (epoch 12 and step 480) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=13 | step=520 | time=12:51:56 | training_speed=0:00:20 | lowest_value=10.71220874786377 (epoch 12 and step 454) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=14 | step=560 | time=12:52:16 | training_speed=0:00:20 | lowest_value=9.570732116699219 (epoch 14 and step 536) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_14e_560s_best_epoch.pth' (epoch 14 and step 560) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=15 | step=600 | time=12:52:42 | training_speed=0:00:21 | lowest_value=8.836076736450195 (epoch 15 and step 560) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_15e_600s_best_epoch.pth' (epoch 15 and step 600)

Saved files (the first epoch is the sync graph)

You said this fix doesn't work but I would like to summarize how it works.

From the screenshot I see that you have set it to save every 10 epochs.

If you set the overtraining threshold to 10 epochs, this means. Save another 10 epochs after the last recorded best_epoch.pth file and if there is no improvement, finish training because the model is overtraining.

and every time a new best_epoch is found it deletes the old best_epoch file because the previous best_epoch.pth file is no longer the best epoch.

So to summarize, if current_epoch > best_epoch+overtraining_threshold_value stop training bc of overtraining. and every time a new best epoch is found, save best_epoch.pth and delete the previous best epoch file.

aitronssesin · 2024-05-19T15:37:03Z

We're gonna merge this pull request and give it a spin. If the overtraining detector looks sharper, we'll roll with the changes.

rappc87 added 2 commits May 15, 2024 11:26

Overtrain fix

7197cbe

now overtrain works as it should. after training for lowest_value+overtrain_threshold, if there is no decrease in lowest_value, it overtrains and the train stops.

Add files via upload

0aedda1

aitronssesin closed this May 18, 2024

aitronssesin reopened this May 19, 2024

aitronssesin added 2 commits May 19, 2024 17:38

Merge branch 'main' into main

e393d03

Unnecesary

cb4f971

aitronssesin merged commit 8ce5723 into IAHispano:main May 19, 2024
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overtrain fix #431

Overtrain fix #431

rappc87 commented May 15, 2024

rappc87 commented May 15, 2024

aitronssesin commented May 18, 2024

rappc87 commented May 19, 2024

aitronssesin commented May 19, 2024

Overtrain fix #431

Overtrain fix #431

Conversation

rappc87 commented May 15, 2024

rappc87 commented May 15, 2024

aitronssesin commented May 18, 2024

rappc87 commented May 19, 2024

aitronssesin commented May 19, 2024