Why I cannot save model? #97

txye · 2023-11-14T15:54:43Z

raise RuntimeError(
RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'0.auto_model.shared.weight', '0.auto_model.encoder.embed_tokens.weight'}].
A potential way to correctly save your model is to use save_model.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

The text was updated successfully, but these errors were encountered:

nprasanthi7 · 2023-11-28T12:07:32Z

RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'0.auto_model.encoder.embed_tokens.weight', '0.auto_model.shared.weight'}].
A potential way to correctly save your model is to use save_model.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

       Could you please help me to resolve this?

hongjin-su · 2023-12-19T09:21:35Z

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

Could you provide a short script for me to reproduce the error?

tush05tgsingh · 2024-03-18T18:21:38Z

I am getting the same error! I don't know how to solve this @hongjin-su I hope you would help me in this:

Traceback (most recent call last):
File "/ClusterLLM/perspective/2_finetune/finetune.py", line 617, in
main()
File "ClusterLLM/perspective/2_finetune/finetune.py", line 598, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 1624, in train
return inner_training_loop(
File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 2029, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 2423, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 2499, in _save_checkpoint
self.save_model(staging_output_dir, _internal_call=True)
File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 3016, in save_model
self._save(output_dir)
File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 3083, in _save
safetensors.torch.save_file(
File ".conda/envs/696ds/lib/python3.9/site-packages/safetensors/torch.py", line 281, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
File ".conda/envs/696ds/lib/python3.9/site-packages/safetensors/torch.py", line 477, in _flatten
raise RuntimeError(
RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'0.auto_model.shared.weight', '0.auto_model.encoder.embed_tokens.weight'}].
A potential way to correctly save your model is to use save_model.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why I cannot save model? #97

Why I cannot save model? #97

txye commented Nov 14, 2023

nprasanthi7 commented Nov 28, 2023

hongjin-su commented Dec 19, 2023

tush05tgsingh commented Mar 18, 2024

Why I cannot save model? #97

Why I cannot save model? #97

Comments

txye commented Nov 14, 2023

nprasanthi7 commented Nov 28, 2023

hongjin-su commented Dec 19, 2023

tush05tgsingh commented Mar 18, 2024