Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudnn_gru ValueError when forward_split= True #27

Open
EduardBermejoScrm opened this issue Oct 19, 2018 · 5 comments
Open

cudnn_gru ValueError when forward_split= True #27

EduardBermejoScrm opened this issue Oct 19, 2018 · 5 comments

Comments

@EduardBermejoScrm
Copy link

EduardBermejoScrm commented Oct 19, 2018

First of all thank you for upgrading your code and having fixed all issues recently!

When I run train with --no_forward_split everything works ok, however when running train() to eval with forward_split=True I get a ValueError: Variable cudnn_gru_1/opaque_kernel does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?
Any idea on how to fix this or what is causing this issue?

Could it be related to the fact that now we are instantiating two models? train_model and forward_eval_model ?

Thank you

@EduardBermejoScrm EduardBermejoScrm changed the title when running train() to eval with forward_split=True I get a ValueError when forward_split= True Oct 19, 2018
@EduardBermejoScrm EduardBermejoScrm changed the title ValueError when forward_split= True cudnn_gru ValueError when forward_split= True Oct 19, 2018
@svjack
Copy link

svjack commented Nov 6, 2018

this issue occur for the reason that in the class Model init multi-times without variable scope control.
you may tackle it by with tf.variable reuse scope (for example, in the init func of class Model)

@liumanfei
Copy link

@EduardBermejoScrm Hello, got the same error here, have you fixed it? I use tf 1.11.0, Is there a connection?

@liumanfei
Copy link

this issue occur for the reason that in the class Model init multi-times without variable scope control.
you may tackle it by with tf.variable reuse scope (for example, in the init func of class Model)

It sets scope.reuse_variables() out of class Model ,which is between the 1st and 2nd init in function train, could you please explain why it doesn't work?

@svjack
Copy link

svjack commented Nov 9, 2018

it is my mistake i answer the question.

@EduardBermejoScrm
Copy link
Author

EduardBermejoScrm commented Nov 12, 2018

Hey, yes, I fixed it. I wrote a stackoverflow general question about this issue:
link
The thing is that scope.reuse_variables() is not working properly so I ended up wrapping both the train and the eval model in a 'model' variable_scope so they could share it.
Before train_model in line 471 of trainer.py add this line to wrap the train model creation:
with tf.variable_scope('model') as scope:
and then before eval_stages = [] in line 474 add the same line to wrap the eval model in the same variable scope.
Finally delete scope.reuse_variables() in line 472.
Hope this helps!

amankhandelia added a commit to amankhandelia/kaggle-web-traffic that referenced this issue Jan 7, 2020
For more details go on this link: Arturus#27
@github-staff github-staff deleted a comment from priti-aid May 22, 2024
@github-staff github-staff deleted a comment from priti-aid May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants