运行trainer时报错Error building extension 'fused_adam' #122

J-G-Y · 2023-12-17T06:34:41Z

如题

J-G-Y · 2023-12-17T07:04:38Z

前置报错Unsupported gpu architecture 'compute_80'

liucongg · 2023-12-27T05:23:44Z

with torch.cuda.amp.autocast(enabled=True, dtype=torch.bfloat16) as autocast, torch.backends.cuda.sdp_kernel(enable_flash=False) as disable:
outputs = model(**batch, use_cache=False)
loss = outputs.loss
tr_loss += loss.item()
model.backward(loss)
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
model.step()

Stark-zheng · 2023-12-28T05:51:15Z

with torch.cuda.amp.autocast(enabled=True, dtype=torch.bfloat16) as autocast, torch.backends.cuda.sdp_kernel(enable_flash=False) as disable: outputs = model(**batch, use_cache=False) loss = outputs.loss tr_loss += loss.item() model.backward(loss) torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) model.step()

请问这段代码加在哪里？我也和楼主一样的错误，是在：
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
dist_init_required=True)
这里报的错误～

liucongg · 2024-01-07T10:03:20Z

听该是cuda版本的问题，cuda版本和装的要保持一致

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行trainer时报错Error building extension 'fused_adam' #122

运行trainer时报错Error building extension 'fused_adam' #122

J-G-Y commented Dec 17, 2023

J-G-Y commented Dec 17, 2023

liucongg commented Dec 27, 2023

Stark-zheng commented Dec 28, 2023

liucongg commented Jan 7, 2024

运行trainer时报错Error building extension 'fused_adam' #122

运行trainer时报错Error building extension 'fused_adam' #122

Comments

J-G-Y commented Dec 17, 2023

J-G-Y commented Dec 17, 2023

liucongg commented Dec 27, 2023

Stark-zheng commented Dec 28, 2023

liucongg commented Jan 7, 2024