Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor Shape size issue #5

Open
mshaheryar91 opened this issue May 28, 2022 · 9 comments
Open

Tensor Shape size issue #5

mshaheryar91 opened this issue May 28, 2022 · 9 comments

Comments

@mshaheryar91
Copy link

Hello,
I am running the code and in second epoch, it says the in and out tensor size issue. The in is 3,256,256 and out is 3,248,258. I tried to use resize function. I have used the following configuration:
batch_size: 8
data_repeat: 80
data_augment: 1
epochs: 1000
lr: 0.00025

Please guide!

@mshaheryar91
Copy link
Author

Waiting for kind reply. If someone can guide me on this

@songzijiang
Copy link

and how you set the following parameters?
patch_size:
batch_size:
I think you should try the commend settings first.

@mshaheryar91
Copy link
Author

patch_size: 256
batch_size: 8
data_repeat: 80
data_augment: 1
epochs: 1000
lr: 0.00025
decays: [250, 400, 450, 475, 500]
gamma: 0.5
log_every: 100
test_every: 1

I have tried it but Cuda got out of memory, so i tried diffreent setting and then it start learning but then in 2nd epoch, it got this error

@songzijiang
Copy link

It looks ok, maybe you can offer the whole config file and console output.

@mshaheryar91
Copy link
Author

elan_light_x4.yml.txt
For Console output, i will rerun the code and then attach it

@mshaheryar91
Copy link
Author

1-epoch
This is first epoch result

@mshaheryar91
Copy link
Author

error
Please find error, its another error now, I am quite surprised why on every iteration it gives new error, related to size, resize, tensor shape. What do you think possible reasons for that?

@mshaheryar91
Copy link
Author

may be I found the reason for that , please see attachment (when i rerun the code again and i found that in first epoch it is not training the whole datasets and giving nan value so i think in every epoch it is missing some files but the code is same so why this is happening.

error-2

@songzijiang
Copy link

I have no idea how to use it in the windows system, maybe you could download the dataset provided in README for a try. Or you can wait for the author‘s response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants