Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with stride multiple of 32 #12708

Open
1 task done
tjasmin111 opened this issue May 15, 2024 · 3 comments
Open
1 task done

Issues with stride multiple of 32 #12708

tjasmin111 opened this issue May 15, 2024 · 3 comments
Labels
question Further information is requested

Comments

@tjasmin111
Copy link

Search before asking

Question

I trained a yolov8 detection model based on 1080x1920 image size and I need to adhere to this size. But when converting to ONNX, YOLOv8 converts it to 1088x1920 size saying stride should be a multiple of 32. My device only takes 1080x1920, not 1088.

What should we do? Any way to force it and keep it as 1080?

Additional

No response

@tjasmin111 tjasmin111 added the question Further information is requested label May 15, 2024
@glenn-jocher
Copy link
Member

@tjasmin111 hello! When working with YOLOv8, it's vital that the input size be a multiple of the stride, which is typically 32 for most models. This requirement ensures proper alignment of layers within the network during the computations.

Unfortunately, forcing the network to accept an input size like 1080x1920, which isn't a multiple of 32, might not be straightforward and could lead to errors or suboptimal results. Here's a possible solution:

  1. Resize your input images to 1088x1920 before passing them to the model.
  2. Post-process the model outputs, if necessary, to map back to the original dimensions.

Here's a simple example of how you could resize an image before inference if using OpenCV:

import cv2

image = cv2.imread("path_to_1080x1920_image.jpg")
resized_image = cv2.resize(image, (1920, 1088))  # Width x Height

This approach should help you keep compatibility with the network requirements without significant performance loss. Let me know if this helps or if you have any more questions!

@tjasmin111
Copy link
Author

Was this also a requirement in Yolov7? Or only in Yolov8?
HD or Full-HD size are very important resolutions. It's difficult to make 1088 to 1080 due to computation requirements.

@glenn-jocher
Copy link
Member

Hello @tjasmin111! 👋 YOLOv8 and some other deep learning models require the input resolution to be divisible by the stride (32 by default) due to architectural considerations in the neural network design. This requirement was not specific to YOLOv7 and varies depending on the underlying model architecture and implementation details.

For handling resolutions like Full-HD (1920x1080), a practical approach is to pad the image slightly to meet the stride requirement rather than resizing, preserving the original aspect ratio and detail. Here's a simple way to pad an image using NumPy:

import cv2
import numpy as np

image = cv2.imread("path_to_image.jpg")
h, w = image.shape[:2]
padded_image = np.zeros((1088, 1920, 3), dtype=np.uint8)
padded_image[:h, :w] = image

This keeps your image detail intact and allows the model to process it in its required input format. Let me know if this workaround fits your needs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants