Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yolov8 detection layers #12709

Open
1 task done
tjasmin111 opened this issue May 15, 2024 · 5 comments
Open
1 task done

Yolov8 detection layers #12709

tjasmin111 opened this issue May 15, 2024 · 5 comments
Labels
question Further information is requested

Comments

@tjasmin111
Copy link

Search before asking

Question

I am confused about the detection layers of Yolov8 and how it works: The following is the architecture of yolov8 that shows there are 3 detect layers in the head. But when I visualize my yolov8n detection model, it shows as a single layer with output size of 1x6x10710.

image

What are these 3 detect layers? And hows does this 1x6x10710 output size relate to these 3 layers?

image

Additional

No response

@tjasmin111 tjasmin111 added the question Further information is requested label May 15, 2024
@glenn-jocher
Copy link
Member

Hello,

Yolov8 utilizes a multi-scale prediction technique where it uses 3 detection layers at different scales to increase the robustness of the model in detecting various object sizes.

The 1x6x10710 output format you're seeing represents the flattened results of these three layers. Each detection layer predicts bounding boxes at its scale, and these are concatenated to form the final output. Here, 6 represents the parameters for each prediction (4 for bounding box coordinates, 1 for confidence score, and 1 for class prediction), and 10710 is the total number of predictions across all layers.

This structure allows YOLOv8 to effectively detect objects at different scales with a single forward pass.

If you need more clarity or further examples, please let us know! 😊

@tjasmin111
Copy link
Author

Got it, thanks. Ok, now my question is given I have this raw 1x6x10710 prediction array, how can I decode the final bbox and classes and confidence? That's exactly what I am looking for. The raw prediction array is read from file and let's assume we have it.

array = np.fromfile(opt.yolo_data_path)

@glenn-jocher
Copy link
Member

To decode the 1x6x10710 prediction array into bounding boxes, classes, and confidence scores, you can reshape the array and then process it. First, reshape the array to separate out each component:

import numpy as np

# Assuming array has been loaded as described
array = np.fromfile(opt.yolo_data_path, dtype=np.float32).reshape(-1, 6)

# Split the data
boxes = array[:, :4]  # Bounding box coordinates
confidences = array[:, 4]  # Confidence scores
class_scores = array[:, 5]  # Class scores

This code snippet assumes each row in your final array represents a prediction, with the first four entries of each row denoting the bounding box coordinates, the fifth entry denoting the confidence score, and the sixth entry denoting the class score. Adjust the dtype in np.fromfile based on the data format saved. If you have any more questions or need further assistance, feel free to ask! 😊

@tjasmin111
Copy link
Author

Got it. Then how to apply NMS to them? We need the final results.

@glenn-jocher
Copy link
Member

Hello!

To apply Non-Maximum Suppression (NMS) to filter out overlapping bounding boxes and keep only the best ones, you can use a utility from libraries like OpenCV. Here’s a quick example using Python:

import numpy as np
import cv2

# Assume 'boxes', 'confidences', and 'class_scores' are already defined
indices = cv2.dnn.NMSBoxes(boxes.tolist(), confidences.tolist(), score_threshold=0.5, nms_threshold=0.4)

final_boxes = boxes[indices].reshape(-1, 4)
final_confidences = confidences[indices]
final_class_scores = class_scores[indices]

This example uses cv2.dnn.NMSBoxes, where you set a score_threshold to filter detection results based on confidence and nms_threshold to specify the overlap threshold for suppressing weaker overlapping boxes.

I hope this helps! Let me know if you have further questions. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants