Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练时报错AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' #144

Open
Qinger27 opened this issue Apr 12, 2024 · 1 comment

Comments

@Qinger27
Copy link

下面是报错信息,可以帮我看看吗?

ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/dockerdata/graceqwang/videollava/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/dockerdata/graceqwang/my_code/Video-LLaVA/videollava/train/train_mem.py", line 12, in
train()
File "/dockerdata/graceqwang/my_code/Video-LLaVA/videollava/train/train.py", line 1074, in train
trainer.train()
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/transformers/trainer.py", line 1656, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/accelerate/accelerator.py", line 1198, in prepare
result = self._prepare_deepspeed(*args)
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/accelerate/accelerator.py", line 1531, in _prepare_deepspeed
optimizer = DeepSpeedCPUAdam(optimizer.param_groups, **defaults)
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 454, in load
return self.jit_load(verbose)
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 497, in jit_load
op_module = load(name=self.name,
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_write_ninja_file_and_build_library(
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f0e48eba4d0>
Traceback (most recent call last):
File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

@dwsmart32
Copy link

I have the same issue. Have you solved it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants