Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Met Nan after 150000 Iter #36

Open
JudgeLJX opened this issue Jan 12, 2024 · 2 comments
Open

Met Nan after 150000 Iter #36

JudgeLJX opened this issue Jan 12, 2024 · 2 comments

Comments

@JudgeLJX
Copy link

Hello,

Thanks for your work.

May I ask, have you met this problem in your training with train_gen.py

[lgan_mmd-CD] nan
[lgan_cov-CD] 0.24250001
[lgan_mmd_smp-CD] nan
Traceback (most recent call last):
File "train_gen.py", line 222, in
test(it)
File "train_gen.py", line 185, in test
jsd = jsd_between_point_cloud_sets(gen_pcs.cpu().numpy(), ref_pcs.cpu().numpy())
File "/home2/diffusion-point-cloud/evaluation/evaluation_metrics.py", line 260, in jsd_between_point_cloud_sets
sample_pcs, resolution, in_unit_sphere)[1]
File "/home2/diffusion-point-cloud/evaluation/evaluation_metrics.py", line 291, in entropy_of_occupancy_grid
_, indices = nn.kneighbors(pc)
File "/home2/miniconda3/envs/dpm-pc-gen/lib/python3.7/site-packages/sklearn/neighbors/_base.py", line 670, in kneighbors
X = check_array(X, accept_sparse='csr')
File "/home2/miniconda3/envs/dpm-pc-gen/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/home2/miniconda3/envs/dpm-pc-gen/lib/python3.7/site-packages/sklearn/utils/validation.py", line 721, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/home2/miniconda3/envs/dpm-pc-gen/lib/python3.7/site-packages/sklearn/utils/validation.py", line 106, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

I think the training will not stop until we manually stop it because iter is set to inf. However it failed to generate samples using 150000.pt

Best Wishes

@Anonymous-ECCV-project
Copy link

Hello, I have encountered the same problem, may I ask if you have solved the reason why NaN occurs?

@GunnerStone
Copy link

Usually NaN errors occur during model training due to values getting so small they become 0. Dividing by that 0 is no good. My guess is that your model is training so long, some value somewhere becomes 0 and breaks the training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants