Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load pretrained model when using DDP #2316

Open
Chaotic-chaos opened this issue Dec 27, 2023 · 4 comments
Open

Cannot load pretrained model when using DDP #2316

Chaotic-chaos opened this issue Dec 27, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@Chaotic-chaos
Copy link

Describe the bug

Hi, sorry to bother.
I am running a speaker verfication exp with speechbrain using DDP. The pretrained model was trained with speechbrain-0.5.13 && pytorch-1.10.0. And when I want to continue to finetune the model on RTX 4090 && speechbrain-0.5.15 && pytorch-2.1.2, I am getting this error blow:

Traceback (most recent call last):
  File "utils/train_speaker_embeddings_v2_finetune.py", line 287, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "utils/parameter_transfer.py", line 320, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "utils/parameter_transfer.py", line 328, in _call_load_hooks
    loadpath = paramfiles[name]
KeyError: 'embedding_model'
Traceback (most recent call last):
  File "utils/train_speaker_embeddings_v2_finetune.py", line 287, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "utils/parameter_transfer.py", line 320, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "utils/parameter_transfer.py", line 328, in _call_load_hooks
    loadpath = paramfiles[name]
KeyError: 'embedding_model'
Traceback (most recent call last):
  File "utils/train_speaker_embeddings_v2_finetune.py", line 287, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "utils/parameter_transfer.py", line 320, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "utils/parameter_transfer.py", line 328, in _call_load_hooks
    loadpath = paramfiles[name]
KeyError: 'embedding_model'
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/finetune.w.real_label.231221A.yaml/77777
Traceback (most recent call last):
  File "utils/train_speaker_embeddings_v2_finetune.py", line 287, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "utils/parameter_transfer.py", line 320, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "utils/parameter_transfer.py", line 328, in _call_load_hooks
    loadpath = paramfiles[name]
KeyError: 'embedding_model'
Traceback (most recent call last):
  File "utils/train_speaker_embeddings_v2_finetune.py", line 287, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "utils/parameter_transfer.py", line 320, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "utils/parameter_transfer.py", line 328, in _call_load_hooks
    loadpath = paramfiles[name]
KeyError: 'embedding_model'

Expected behaviour

The code will load the embedding ckp from the path only, and start the finetune training process.

To Reproduce

No response

Environment Details

python: 3.8
pytorch: 1.10.0(which the ckp was saved from) / 2.1.0(which I want to use to finetune)
speechbrain: 0.5.13(which the ckp was saved from) / 0.5.15(which I want to use to finetune)

GPU: RTX 3090 / RTX 4090

Relevant Log Output

Traceback (most recent call last):
  File "utils/train_speaker_embeddings_v2_finetune.py", line 287, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "utils/parameter_transfer.py", line 320, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "utils/parameter_transfer.py", line 328, in _call_load_hooks
    loadpath = paramfiles[name]
KeyError: 'embedding_model'
Traceback (most recent call last):
  File "utils/train_speaker_embeddings_v2_finetune.py", line 287, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "utils/parameter_transfer.py", line 320, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "utils/parameter_transfer.py", line 328, in _call_load_hooks
    loadpath = paramfiles[name]
KeyError: 'embedding_model'
Traceback (most recent call last):
  File "utils/train_speaker_embeddings_v2_finetune.py", line 287, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "utils/parameter_transfer.py", line 320, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "utils/parameter_transfer.py", line 328, in _call_load_hooks
    loadpath = paramfiles[name]
KeyError: 'embedding_model'
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/finetune.w.real_label.231221A.yaml/77777
Traceback (most recent call last):
  File "utils/train_speaker_embeddings_v2_finetune.py", line 287, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "utils/parameter_transfer.py", line 320, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "utils/parameter_transfer.py", line 328, in _call_load_hooks
    loadpath = paramfiles[name]
KeyError: 'embedding_model'
Traceback (most recent call last):
  File "utils/train_speaker_embeddings_v2_finetune.py", line 287, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "utils/parameter_transfer.py", line 320, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "utils/parameter_transfer.py", line 328, in _call_load_hooks
    loadpath = paramfiles[name]
KeyError: 'embedding_model'

Additional Context

No response

@Chaotic-chaos Chaotic-chaos added the bug Something isn't working label Dec 27, 2023
@Chaotic-chaos
Copy link
Author

BYW, this code works well without DDP. It seems that there is only one process can load the embedding_model from disk successfully.

@TParcollet
Copy link
Collaborator

Hi, this is most likely due to an error in the .py script, can we see the "utils/train_speaker_embeddings_v2_finetune.py"?

@Chaotic-chaos
Copy link
Author

Sorry for the late reply.
There's some code cannot be shown in the .py. But I can show you the main function of it.

    # This flag enables the inbuilt cudnn auto-tuner
    torch.backends.cudnn.benchmark = True

    # CLI:
    hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])

    # Initialize ddp (useful only for multi-GPU DDP training)
    sb.utils.distributed.ddp_init_group(run_opts)

    # Load hyperparameters file with command-line overrides
    with open(hparams_file) as fin:
        hparams = load_hyperpyyaml(fin, overrides)


    sb.core.create_experiment_directory(
        experiment_directory=hparams["output_folder"],
        hyperparams_to_save=hparams_file,
        overrides=overrides,
    )
    # Initialization of the pre-trainer
    run_on_main(hparams["pretrainer"].collect_files)
    hparams["pretrainer"].load_collected(device=run_opts["device"])
    # run_on_main(hparams["pretrainer"].load_collected(device=run_opts["device"]))

    #print(hparams["pretrainer"].loadables["embedding_model"] is hparams['embedding_model'] )


    # Brain class initialization
    speaker_brain = SpeakerBrain(
        modules=hparams["modules"],
        opt_class=hparams["opt_class"],
        hparams=hparams,
        run_opts=run_opts,
        checkpointer=hparams["checkpointer"],
    )
    #speaker_brain.optimizer.add_param_group({'params':speaker_brain.modules.embedding_model.parameters(),'lr':1e-4})
    #speaker_brain.modules.embedding_model = hparams["pretrainer"].loadables["embedding_model"]

    #print(speaker_brain.modules.embedding_model.state_dict())


    # Training
    speaker_brain.fit(
        speaker_brain.hparams.epoch_counter,
        train_data,
        valid_data,
        train_loader_kwargs=hparams["dataloader_options"],
        valid_loader_kwargs=hparams["dataloader_options"],
    )

@Adel-Moumen
Copy link
Collaborator

Hello,

I see that you are using an old version of SpeechBrain. Maybe trying to switch from your old version to SpeechBrain 1.0 (-> git clone speechbrain/speechbrain.git) may solve your issue? In any case, if you requires more in-depth assistance you'll need to provide some scripts to reproduce your issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants