New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot load pretrained model when using DDP #2316
Comments
BYW, this code works well without DDP. It seems that there is only one process can load the embedding_model from disk successfully. |
Hi, this is most likely due to an error in the .py script, can we see the "utils/train_speaker_embeddings_v2_finetune.py"? |
Sorry for the late reply. # This flag enables the inbuilt cudnn auto-tuner
torch.backends.cudnn.benchmark = True
# CLI:
hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
# Initialize ddp (useful only for multi-GPU DDP training)
sb.utils.distributed.ddp_init_group(run_opts)
# Load hyperparameters file with command-line overrides
with open(hparams_file) as fin:
hparams = load_hyperpyyaml(fin, overrides)
sb.core.create_experiment_directory(
experiment_directory=hparams["output_folder"],
hyperparams_to_save=hparams_file,
overrides=overrides,
)
# Initialization of the pre-trainer
run_on_main(hparams["pretrainer"].collect_files)
hparams["pretrainer"].load_collected(device=run_opts["device"])
# run_on_main(hparams["pretrainer"].load_collected(device=run_opts["device"]))
#print(hparams["pretrainer"].loadables["embedding_model"] is hparams['embedding_model'] )
# Brain class initialization
speaker_brain = SpeakerBrain(
modules=hparams["modules"],
opt_class=hparams["opt_class"],
hparams=hparams,
run_opts=run_opts,
checkpointer=hparams["checkpointer"],
)
#speaker_brain.optimizer.add_param_group({'params':speaker_brain.modules.embedding_model.parameters(),'lr':1e-4})
#speaker_brain.modules.embedding_model = hparams["pretrainer"].loadables["embedding_model"]
#print(speaker_brain.modules.embedding_model.state_dict())
# Training
speaker_brain.fit(
speaker_brain.hparams.epoch_counter,
train_data,
valid_data,
train_loader_kwargs=hparams["dataloader_options"],
valid_loader_kwargs=hparams["dataloader_options"],
) |
Hello, I see that you are using an old version of SpeechBrain. Maybe trying to switch from your old version to SpeechBrain 1.0 (-> git clone speechbrain/speechbrain.git) may solve your issue? In any case, if you requires more in-depth assistance you'll need to provide some scripts to reproduce your issue... |
Describe the bug
Hi, sorry to bother.
I am running a speaker verfication exp with speechbrain using DDP. The pretrained model was trained with speechbrain-0.5.13 && pytorch-1.10.0. And when I want to continue to finetune the model on RTX 4090 && speechbrain-0.5.15 && pytorch-2.1.2, I am getting this error blow:
Expected behaviour
The code will load the embedding ckp from the path only, and start the finetune training process.
To Reproduce
No response
Environment Details
python: 3.8
pytorch: 1.10.0(which the ckp was saved from) / 2.1.0(which I want to use to finetune)
speechbrain: 0.5.13(which the ckp was saved from) / 0.5.15(which I want to use to finetune)
GPU: RTX 3090 / RTX 4090
Relevant Log Output
Additional Context
No response
The text was updated successfully, but these errors were encountered: