Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why I cannot save model? #97

Open
txye opened this issue Nov 14, 2023 · 3 comments
Open

Why I cannot save model? #97

txye opened this issue Nov 14, 2023 · 3 comments

Comments

@txye
Copy link

txye commented Nov 14, 2023

raise RuntimeError(
RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'0.auto_model.shared.weight', '0.auto_model.encoder.embed_tokens.weight'}].
A potential way to correctly save your model is to use save_model.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

@nprasanthi7
Copy link

RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'0.auto_model.encoder.embed_tokens.weight', '0.auto_model.shared.weight'}].
A potential way to correctly save your model is to use save_model.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

       Could you please help me to resolve this?

@hongjin-su
Copy link
Collaborator

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

Could you provide a short script for me to reproduce the error?

@tush05tgsingh
Copy link

I am getting the same error! I don't know how to solve this @hongjin-su I hope you would help me in this:

Traceback (most recent call last):
File "/ClusterLLM/perspective/2_finetune/finetune.py", line 617, in
main()
File "ClusterLLM/perspective/2_finetune/finetune.py", line 598, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 1624, in train
return inner_training_loop(
File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 2029, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 2423, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 2499, in _save_checkpoint
self.save_model(staging_output_dir, _internal_call=True)
File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 3016, in save_model
self._save(output_dir)
File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 3083, in _save
safetensors.torch.save_file(
File ".conda/envs/696ds/lib/python3.9/site-packages/safetensors/torch.py", line 281, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
File ".conda/envs/696ds/lib/python3.9/site-packages/safetensors/torch.py", line 477, in _flatten
raise RuntimeError(
RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'0.auto_model.shared.weight', '0.auto_model.encoder.embed_tokens.weight'}].
A potential way to correctly save your model is to use save_model.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants