Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行trainer时报错Error building extension 'fused_adam' #122

Open
J-G-Y opened this issue Dec 17, 2023 · 4 comments
Open

运行trainer时报错Error building extension 'fused_adam' #122

J-G-Y opened this issue Dec 17, 2023 · 4 comments

Comments

@J-G-Y
Copy link

J-G-Y commented Dec 17, 2023

如题

@J-G-Y
Copy link
Author

J-G-Y commented Dec 17, 2023

前置报错Unsupported gpu architecture 'compute_80'

@liucongg
Copy link
Owner

with torch.cuda.amp.autocast(enabled=True, dtype=torch.bfloat16) as autocast, torch.backends.cuda.sdp_kernel(enable_flash=False) as disable:
outputs = model(**batch, use_cache=False)
loss = outputs.loss
tr_loss += loss.item()
model.backward(loss)
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
model.step()

@Stark-zheng
Copy link

with torch.cuda.amp.autocast(enabled=True, dtype=torch.bfloat16) as autocast, torch.backends.cuda.sdp_kernel(enable_flash=False) as disable: outputs = model(**batch, use_cache=False) loss = outputs.loss tr_loss += loss.item() model.backward(loss) torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) model.step()

请问这段代码加在哪里?我也和楼主一样的错误,是在:
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
dist_init_required=True)
这里报的错误~

@liucongg
Copy link
Owner

liucongg commented Jan 7, 2024

听该是cuda版本的问题,cuda版本和装的要保持一致

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants