Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: shape '[1, 3, 29, 76, 76]' is invalid for input of size 1472880 #138

Open
DongChen06 opened this issue Jul 3, 2020 · 12 comments

Comments

@DongChen06
Copy link

Hello, When I started training on my own dataset, it shows error like.

log file path:log\log_2020-07-02_23-18-42.txt
2020-07-02 23:18:42,322 train.py[line:468] INFO: Using device cuda
convalution havn't activate linear
convalution havn't activate linear
convalution havn't activate linear
2020-07-02 23:18:44,634 train.py[line:318] INFO: Starting training:
        Epochs:          300
        Batch size:      16
        Subdivisions:    16
        Learning rate:   0.001
        Training size:   144
        Validation size: 16
        Checkpoints:     True
        Device:          cuda
        Images size:     608
        Optimizer:       adam
        Dataset classes: 24
        Train label path:data/train.txt
        Pretrained:
    
Epoch 1/300:   0%|       | 0/144 [00:12<?, ?img/s]
Traceback (most recent call last):
  File "C:/Users/Windows/Downloads/pytorch-YOLOv4-master/train.py", line 483, in <module>
    device=device, )
  File "C:/Users/Windows/Downloads/pytorch-YOLOv4-master/train.py", line 356, in train
    loss, loss_xy, loss_wh, loss_obj, loss_cls, loss_l2 = criterion(bboxes_pred, bboxes)
  File "C:\Softwares\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:/Users/Windows/Downloads/pytorch-YOLOv4-master/train.py", line 233, in forward
    output = output.view(batchsize, self.n_anchors, n_ch, fsize, fsize)
RuntimeError: shape '[1, 3, 29, 76, 76]' is invalid for input of size 1472880

Does anyone face the same problem?

@Tianxiaomo
Copy link
Owner

You can choose how to build the model.

pytorch-YOLOv4/cfg.py

Lines 17 to 18 in af00822

Cfg.use_darknet_cfg = True
Cfg.cfgfile = 'cfg/yolov4.cfg'

if you use cfg file,

Cfg.use_darknet_cfg = True
Cfg.cfgfile = 'cfg file'

and you need to change the class parameter of the yolo layer of the cfg file according to your data set.

[yolo]
mask = 0,1,2
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.2
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

[yolo]
mask = 3,4,5
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.1
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
scale_x_y = 1.05
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

@SenWang-NEU
Copy link

Hello, When I started training on my own dataset, it shows error like.

log file path:log\log_2020-07-02_23-18-42.txt
2020-07-02 23:18:42,322 train.py[line:468] INFO: Using device cuda
convalution havn't activate linear
convalution havn't activate linear
convalution havn't activate linear
2020-07-02 23:18:44,634 train.py[line:318] INFO: Starting training:
        Epochs:          300
        Batch size:      16
        Subdivisions:    16
        Learning rate:   0.001
        Training size:   144
        Validation size: 16
        Checkpoints:     True
        Device:          cuda
        Images size:     608
        Optimizer:       adam
        Dataset classes: 24
        Train label path:data/train.txt
        Pretrained:
    
Epoch 1/300:   0%|       | 0/144 [00:12<?, ?img/s]
Traceback (most recent call last):
  File "C:/Users/Windows/Downloads/pytorch-YOLOv4-master/train.py", line 483, in <module>
    device=device, )
  File "C:/Users/Windows/Downloads/pytorch-YOLOv4-master/train.py", line 356, in train
    loss, loss_xy, loss_wh, loss_obj, loss_cls, loss_l2 = criterion(bboxes_pred, bboxes)
  File "C:\Softwares\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:/Users/Windows/Downloads/pytorch-YOLOv4-master/train.py", line 233, in forward
    output = output.view(batchsize, self.n_anchors, n_ch, fsize, fsize)
RuntimeError: shape '[1, 3, 29, 76, 76]' is invalid for input of size 1472880

Does anyone face the same problem?

Did you solve it? According to the author's reply, I still receive the error "RuntimeError: shape'[1, 3, 8, 76, 76]' is invalid for input of size 1472880"

Repository owner deleted a comment from SenWang-NEU Jul 3, 2020
@DongChen06
Copy link
Author

@Tianxiaomo, Thanks. Do we need to edit the parameters like anchors when we use custom dataset?

@SenWang-NEU I solved the problem by choosing Cfg.use_darknet_cfg = False and use the Cfg configure.

I trained the model for about 10 hours by 2x 2080 Ti, The training courses look like this,
image

The losses reduce a lot, while still remain relatively big values like,
Message: 'Train step_46880: loss : 147.75306701660156,loss xy : 24.71426010131836,loss wh : 1.184415578842163,loss obj : 88.55791473388672,loss cls : 33.29647445678711,loss l2 : 20.337644577026367,lr : 0.001'

@Tianxiaomo I have one question, what is the acceptable loss range for us to get good inference results?

@SenWang-NEU
Copy link

@Tianxiaomo,谢谢。 使用自定义数据集时,是否需要像锚一样编辑参数?

@ SenWang-NEU我通过选择 Cfg.use_darknet_cfg = False并使用Cfg配置解决了该问题。

我用2x 2080 Ti训练了大约10个小时的模型,训练课程看起来像这样,
图片

损失减少了很多,但仍然保持相对较大的价值,例如,
Message: 'Train step_46880: loss : 147.75306701660156,loss xy : 24.71426010131836,loss wh : 1.184415578842163,loss obj : 88.55791473388672,loss cls : 33.29647445678711,loss l2 : 20.337644577026367,lr : 0.001'

@Tianxiaomo我有一个问题,对我们而言,获得良好推断结果的可接受损失范围是多少?

Thanks, based on your reply, I solved the problem and the model has started training.

@Tianxiaomo
Copy link
Owner

@Tianxiaomo, Thanks. Do we need to edit the parameters like anchors when we use custom dataset?

@SenWang-NEU I solved the problem by choosing Cfg.use_darknet_cfg = False and use the Cfg configure.

I trained the model for about 10 hours by 2x 2080 Ti, The training courses look like this,
image

The losses reduce a lot, while still remain relatively big values like,
Message: 'Train step_46880: loss : 147.75306701660156,loss xy : 24.71426010131836,loss wh : 1.184415578842163,loss obj : 88.55791473388672,loss cls : 33.29647445678711,loss l2 : 20.337644577026367,lr : 0.001'

@Tianxiaomo I have one question, what is the acceptable loss range for us to get good inference results?

The validation section code has been added, and you can infer in the training is to see when it works best.

@DongChen06
Copy link
Author

I am not so sure why I need to comment use_darknet_cfg to make the code run correctly?
` ``
Cfg.use_darknet_cfg = False
Cfg.cfgfile = 'cfg/yolov4-custom.cfg'

And change the classes in **yolov4-custom.cfg**

@Tianxiaomo
Copy link
Owner

pytorch-YOLOv4/train.py

Lines 607 to 612 in 74347ac

if cfg.use_darknet_cfg:
model = Darknet(cfg.cfgfile)
else:
model = Yolov4(cfg.pretrained, n_classes=cfg.classes)

@Tianxiaomo Tianxiaomo pinned this issue Jul 9, 2020
@Juuustin
Copy link

You can choose how to build the model.

pytorch-YOLOv4/cfg.py

Lines 17 to 18 in af00822

Cfg.use_darknet_cfg = True
Cfg.cfgfile = 'cfg/yolov4.cfg'

if you use cfg file,

Cfg.use_darknet_cfg = True
Cfg.cfgfile = 'cfg file'

and you need to change the class parameter of the yolo layer of the cfg file according to your data set.

[yolo]
mask = 0,1,2
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.2
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

[yolo]
mask = 3,4,5
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.1
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
scale_x_y = 1.05
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

Hey, I had a similar error, can you tell us how to change it according to the dataset? I have no idea how to change and what to change. Thank you very much!

@zhaoyin214
Copy link

zhaoyin214 commented Sep 23, 2020

You can choose how to build the model.

pytorch-YOLOv4/cfg.py

Lines 17 to 18 in af00822

Cfg.use_darknet_cfg = True
Cfg.cfgfile = 'cfg/yolov4.cfg'

if you use cfg file,

Cfg.use_darknet_cfg = True
Cfg.cfgfile = 'cfg file'

and you need to change the class parameter of the yolo layer of the cfg file according to your data set.

[yolo]
mask = 0,1,2
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.2
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

[yolo]
mask = 3,4,5
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.1
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
scale_x_y = 1.05
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

not only classes, but also the filters in [convolutional] just before [yolo],

[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear

[yolo]
mask = 3,4,5
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.1
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

it should be 255 / 85 * (5 + classes)

@zsh4614
Copy link

zsh4614 commented Nov 3, 2020

You can choose how to build the model.

pytorch-YOLOv4/cfg.py

Lines 17 to 18 in af00822

Cfg.use_darknet_cfg = True
Cfg.cfgfile = 'cfg/yolov4.cfg'

if you use cfg file,

Cfg.use_darknet_cfg = True
Cfg.cfgfile = 'cfg file'

and you need to change the class parameter of the yolo layer of the cfg file according to your data set.

[yolo]
mask = 0,1,2
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.2
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

[yolo]
mask = 3,4,5
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.1
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
scale_x_y = 1.05
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

Hey, I had a similar error, can you tell us how to change it according to the dataset? I have no idea how to change and what to change. Thank you very much!

the layer output channels before [yolo] should be set 3*(5+classes) instead of 255!

@Andrej-sens
Copy link

Hello, When I started training on my own dataset, it shows error like.

log file path:log\log_2020-07-02_23-18-42.txt
2020-07-02 23:18:42,322 train.py[line:468] INFO: Using device cuda
convalution havn't activate linear
convalution havn't activate linear
convalution havn't activate linear
2020-07-02 23:18:44,634 train.py[line:318] INFO: Starting training:
        Epochs:          300
        Batch size:      16
        Subdivisions:    16
        Learning rate:   0.001
        Training size:   144
        Validation size: 16
        Checkpoints:     True
        Device:          cuda
        Images size:     608
        Optimizer:       adam
        Dataset classes: 24
        Train label path:data/train.txt
        Pretrained:
    
Epoch 1/300:   0%|       | 0/144 [00:12<?, ?img/s]
Traceback (most recent call last):
  File "C:/Users/Windows/Downloads/pytorch-YOLOv4-master/train.py", line 483, in <module>
    device=device, )
  File "C:/Users/Windows/Downloads/pytorch-YOLOv4-master/train.py", line 356, in train
    loss, loss_xy, loss_wh, loss_obj, loss_cls, loss_l2 = criterion(bboxes_pred, bboxes)
  File "C:\Softwares\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:/Users/Windows/Downloads/pytorch-YOLOv4-master/train.py", line 233, in forward
    output = output.view(batchsize, self.n_anchors, n_ch, fsize, fsize)
RuntimeError: shape '[1, 3, 29, 76, 76]' is invalid for input of size 1472880

Does anyone face the same problem?

I had the same issue.
After the debugging I realised that I did stupid mistake and the input image to YOLOv4 had wrong dimensions. It should be [B, CH, W, H]. If you still have a problem print the size of the output for further debugging.

@wangyuehy
Copy link

I have met same issue while deal with scale 19201080, the problem is the during upsampling, the x size is [batch, 128, 68, 80] while target_size is [batch, 256,135, 160], the 682 != 135, so for PyTorch, a possible way is to get x[:,:,:135,:].
but another problem is during PyTorch inference, in get_region_boxes, the boxes size is not same with Darknet version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants