RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (CUDABFloat16Type) should be the same #379

Echoxvf · 2024-05-07T18:09:59Z

During the beginning of training, I encountered an issue where the data type was inconsistent with the convolutional kernel type. How should this be resolved?
File "/data/scripts/train.py", line 254, in main
loss_dict = scheduler.training_losses(model, x, t, model_args, mask=mask)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/opensora/schedulers/iddpm/respace.py", line 98, in training_losses
return super().training_losses(self._wrap_model(model), *args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 768, in training_losses
model_output = model(x_t, t, **model_kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/opensora/schedulers/iddpm/respace.py", line 127, in call
return self.model(x, new_ts, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/booster/plugin/low_level_zero_plugin.py", line 65, in forward
return super().forward(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/colossalai/interface/model.py", line 25, in forward
return self.module(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/opensora/models/stdit/stdit.py", line 276, in forward
x = self.x_embedder(x) # [B, N, C]
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/opensora/models/layers/blocks.py", line 121, in forward
x = self.proj(x) # (B C T H W)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 610, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 605, in _conv_forward
return F.conv3d(
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (CUDABFloat16Type) should be the same

zhengzangw · 2024-05-09T09:05:04Z

Could you provide more information on your command? I run the following and everything is ok.

torchrun --standalone --nproc_per_node 1 scripts/train.py configs/opensora-v1-1/train/stage1.py --data-path MY_DATA_PATH

I guess you do not correctly specify dtype in the config or not pull the latest repo.

Echoxvf · 2024-05-09T16:47:55Z

Thank you for your response, I utilize the open-sora 1.0 training command
torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x256.py --data-path YOUR_CSV_PATH
The Open-Sora 1.1 Training command is OK.
Thank you again.

TXacs · 2024-05-11T03:13:58Z

I have the same problem, and even I checkout v1.1.0 to run the script, got same error.

My command:
torchrun --nproc-per-node=4 scripts/train.py configs/pixart/train/1x512x512.py --data-path CSV_FILE

The config is nothing changes.

zhengzangw · 2024-05-11T07:25:23Z

This issue is because we update the model's config according to Huggingface, and do not change the previous ones.

TXacs · 2024-05-11T07:57:37Z

Thanks a lot! I add the code into pixart.py, solved the problem! Not only stdit.py.

def forward(self, x, timestep, y, mask=None):
      """
      Forward pass of PixArt.
      x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)
      t: (N,) tensor of diffusion timesteps
      y: (N, 1, 120, C) tensor of class labels
      """
      dtype = self.x_embedder.proj.weight.dtype
      x = x.to(dtype)
      timestep = timestep.to(dtype)
      y = y.to(dtype)

This issue is because we update the model's config according to Huggingface, and do not change the previous ones.

zhengzangw added help wanted Extra attention is needed question Further information is requested and removed help wanted Extra attention is needed labels May 9, 2024

zhengzangw mentioned this issue May 11, 2024

update stdit dtype #392

Merged

zhengzangw added the bug Something isn't working label May 11, 2024

Echoxvf closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (CUDABFloat16Type) should be the same #379

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (CUDABFloat16Type) should be the same #379

Echoxvf commented May 7, 2024

zhengzangw commented May 9, 2024

Echoxvf commented May 9, 2024

TXacs commented May 11, 2024

zhengzangw commented May 11, 2024

TXacs commented May 11, 2024 •

edited

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (CUDABFloat16Type) should be the same #379

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (CUDABFloat16Type) should be the same #379

Comments

Echoxvf commented May 7, 2024

zhengzangw commented May 9, 2024

Echoxvf commented May 9, 2024

TXacs commented May 11, 2024

zhengzangw commented May 11, 2024

TXacs commented May 11, 2024 • edited

TXacs commented May 11, 2024 •

edited