Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddp ERROR #71

Open
liyingjie1991 opened this issue Jan 5, 2024 · 1 comment
Open

ddp ERROR #71

liyingjie1991 opened this issue Jan 5, 2024 · 1 comment

Comments

@liyingjie1991
Copy link

hi, when I run the training code, I met the following error. Can you give me some advice?
` File "/ssd5/exec/liyj/miniconda3/envs/seamless/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(*args, **kwargs)
File "/ssd5/exec/liyj/miniconda3/envs/seamless/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised TypeError: _convert_frame_assert() missing 1 required positional argument: 'hooks'

Set torch._dynamo.config.verbose=True for more information

You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True

Traceback (most recent call last):
File "/ssd5/exec/liyj/miniconda3/envs/seamless/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 670, in call_user_compiler
compiled_fn = compiler_fn(gm, self.fake_example_inputs())
File "/ssd5/exec/liyj/miniconda3/envs/seamless/lib/python3.9/site-packages/torch/_dynamo/backends/distributed.py", line 203, in compile_fn
return self.backend_compile_fn(gm, example_inputs)
TypeError: _convert_frame_assert() missing 1 required positional argument: 'hooks'`

Version:
torch: '2.0.1+cu117'

@sanchit-gandhi
Copy link
Collaborator

Hey @liyingjie1991 - are you using torch compile while training? I personally didn't test training with this configuration, but would expect it to work for training as expected (static shapes). The generate step during evaluation probably won't work, since we use a dynamic k/v cache in Transformers, and so have dynamic shapes. If you're using torch compile, could you try disabling it for evaluation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants