Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. #23

Open
RuohaoYan opened this issue Sep 25, 2023 · 6 comments

Comments

@RuohaoYan
Copy link

Error occurs when I use below command

python infer_finetuning.py

image

I have specified the gpu or cpu in the parameter to load the model,
Error resolution

@ssbuild
Copy link
Owner

ssbuild commented Sep 25, 2023

CUDA_LAUNCH_BLOCKING=1 python python infer_finetuning.py
upload the error log

@RuohaoYan
Copy link
Author

CUDA_LAUNCH_BLOCKING=1 python python infer_finetuning.py upload the error log

CUDA error: out of memory
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

@ssbuild
Copy link
Owner

ssbuild commented Sep 25, 2023

put all log , let me kown what happen.

@RuohaoYan
Copy link
Author

put all log , let me kown what happen.

BloomConfig {
"_name_or_path": "model/bloom-560m/config.json",
"apply_residual_connection_post_layernorm": false,
"architectures": [
"BloomForCausalLM"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"bias_dropout_fusion": true,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_dropout": 0.0,
"hidden_size": 1024,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"masked_softmax_fusion": true,
"model_type": "bloom",
"n_head": 16,
"n_inner": null,
"n_layer": 24,
"offset_alibi": 100,
"pad_token_id": 3,
"pretraining_tp": 1,
"return_dict": false,
"skip_bias_add": true,
"skip_bias_add_qkv": false,
"slow_but_exact": false,
"task_specific_params": {},
"transformers_version": "4.33.2",
"unk_token_id": 0,
"use_cache": true,
"vocab_size": 250880
}

None
ModelArguments(model_name_or_path='model/bloom-560m', model_type='bloom', config_overrides=None, config_name='model/bloom-560m/config.json', tokenizer_name='model/bloom-560m', cache_dir=None, do_lower_case=None, use_fast_tokenizer=False, model_revision='main', use_auth_token=False)
Traceback (most recent call last):
File "infer_finetuning.py", line 33, in
pl_model.load_sft_weight(train_weight,strict=True)
File "llm_finetuninglib/python3.11/site-packages/deep_training/trainer/pl/modelweighter.py", line 103, in load_sft_weight
weight_dict = torch.load(sft_weight_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 1172, in _load
result = unpickler.load()
^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 1116, in load_tensor
wrap_storage=restore_location(storage, location),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 217, in default_restore_location
result = fn(storage, location)
^^^^^^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 187, in _cuda_deserialize
return obj.cuda(device)
^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/_utils.py", line 81, in _cuda
untyped_storage = torch.UntypedStorage(
^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: out of memory
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

@ssbuild
Copy link
Owner

ssbuild commented Sep 25, 2023

try use cpu load , it is just CUDA error: out of memory.

@RuohaoYan
Copy link
Author

try use cpu load , it is just CUDA error: out of memory.

Yes. But as I said above, I modified torch.load and it worked on both cpu and gpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants