CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. #23

RuohaoYan · 2023-09-25T05:40:26Z

Error occurs when I use below command

python infer_finetuning.py

I have specified the gpu or cpu in the parameter to load the model,
Error resolution

ssbuild · 2023-09-25T06:13:43Z

CUDA_LAUNCH_BLOCKING=1 python python infer_finetuning.py
upload the error log

RuohaoYan · 2023-09-25T06:21:01Z

CUDA_LAUNCH_BLOCKING=1 python python infer_finetuning.py upload the error log

CUDA error: out of memory
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

ssbuild · 2023-09-25T06:23:14Z

put all log , let me kown what happen.

RuohaoYan · 2023-09-25T06:29:59Z

put all log , let me kown what happen.

BloomConfig {
"_name_or_path": "model/bloom-560m/config.json",
"apply_residual_connection_post_layernorm": false,
"architectures": [
"BloomForCausalLM"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"bias_dropout_fusion": true,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_dropout": 0.0,
"hidden_size": 1024,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"masked_softmax_fusion": true,
"model_type": "bloom",
"n_head": 16,
"n_inner": null,
"n_layer": 24,
"offset_alibi": 100,
"pad_token_id": 3,
"pretraining_tp": 1,
"return_dict": false,
"skip_bias_add": true,
"skip_bias_add_qkv": false,
"slow_but_exact": false,
"task_specific_params": {},
"transformers_version": "4.33.2",
"unk_token_id": 0,
"use_cache": true,
"vocab_size": 250880
}

None
ModelArguments(model_name_or_path='model/bloom-560m', model_type='bloom', config_overrides=None, config_name='model/bloom-560m/config.json', tokenizer_name='model/bloom-560m', cache_dir=None, do_lower_case=None, use_fast_tokenizer=False, model_revision='main', use_auth_token=False)
Traceback (most recent call last):
File "infer_finetuning.py", line 33, in
pl_model.load_sft_weight(train_weight,strict=True)
File "llm_finetuninglib/python3.11/site-packages/deep_training/trainer/pl/modelweighter.py", line 103, in load_sft_weight
weight_dict = torch.load(sft_weight_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 1172, in _load
result = unpickler.load()
^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 1116, in load_tensor
wrap_storage=restore_location(storage, location),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 217, in default_restore_location
result = fn(storage, location)
^^^^^^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/serialization.py", line 187, in _cuda_deserialize
return obj.cuda(device)
^^^^^^^^^^^^^^^^
File "llm_finetuninglib/python3.11/site-packages/torch/_utils.py", line 81, in _cuda
untyped_storage = torch.UntypedStorage(
^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: out of memory
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

ssbuild · 2023-09-25T07:06:53Z

try use cpu load , it is just CUDA error: out of memory.

RuohaoYan · 2023-09-25T07:10:53Z

try use cpu load , it is just CUDA error: out of memory.

Yes. But as I said above, I modified torch.load and it worked on both cpu and gpu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. #23

CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. #23

RuohaoYan commented Sep 25, 2023

ssbuild commented Sep 25, 2023

RuohaoYan commented Sep 25, 2023

ssbuild commented Sep 25, 2023

RuohaoYan commented Sep 25, 2023

ssbuild commented Sep 25, 2023

RuohaoYan commented Sep 25, 2023

CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. #23

CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. #23

Comments

RuohaoYan commented Sep 25, 2023

ssbuild commented Sep 25, 2023

RuohaoYan commented Sep 25, 2023

ssbuild commented Sep 25, 2023

RuohaoYan commented Sep 25, 2023

ssbuild commented Sep 25, 2023

RuohaoYan commented Sep 25, 2023