How do Yolo target assignments to anchors work? #12978

nachoogriis · 2024-04-30T12:16:16Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

I am trying to understand exactly how does Yolo make its predictions. I have found that yolo assigns each target to an anchor, and that there are 3 anchors per detection head per grid cell. I understand this is what makes Yolo have some problems when detecting small objects that are close, as only one prediction is done per grid cell.

I am not sure if this is how it exactly works, but I think it is something like that. Working in some training experiments, I did not quite understand the results, so I decided to run a minimal experiment to understand this fact. Taking into account Yolo's limitations on detecting close and small objects, I decided to use a black image as input (with noise) and a 10x10 pixels white square located somewhere in the image. I then assigned two labels (5 pixels away one from each other) in the white square.

If what I was understanding was okay, the model should reach, more or less, 50% in P and R, as it should only be able to detect one of the labels. However, the model is able to predict both labels.

How is this happening? Where is the error in my understanding?

Additional

No response

glenn-jocher · 2024-04-30T16:44:17Z

Hi there! 😊 It sounds like you're diving deep into the workings of YOLOv5 -- that's fantastic!

Your understanding is on the right track. YOLOv5 assigns targets to anchors based on the best overlap (IoU, or Intersection over Union) between the target and the predefined anchor shapes. Indeed, each grid cell is initially responsible for detecting objects based on the anchors associated with it. This setup could potentially lead to the issue you mentioned with closely packed small objects.

However, YOLOv5 implements several improvements and techniques that enhance its ability to detect small or closely spaced objects. These include:

Multiple anchor boxes per grid cell, which allow the model to predict multiple objects in close proximity if those objects match well with the different anchor shapes.
Multi-scale predictions across different layers of the network enable the model to capture features of objects at various scales, improving its performance on small objects.
Non-maximum suppression (NMS) post-processing, which helps in filtering out overlapping detections, ensuring that even if multiple anchors detect the same object, only the best prediction (highest confidence) is retained.

Given these mechanisms, it's not too surprising that your experiment with the white square and two closely spaced labels resulted in detections for both labels. The model's ability to leverage different anchor shapes and scales, as well as post-processing steps like NMS, can contribute to detecting objects that are close together, overcoming some of the limitations you highlighted.

For a deeper dive into the specifics, including anchor assignment and the handling of overlapping detections, the official Ultralytics documentation and source code comments offer a wealth of detail. Check out the docs at https://docs.ultralytics.com/yolov5/ for more in-depth explanations and insights.

If you have more questions or need further clarification, feel free to ask. Keep up the great exploration! 🚀

nachoogriis · 2024-05-02T11:23:24Z

Okey, thanks for your answer. I guess then that, if I have two different classes and both classes fit in the available anchors for each cell (let's say there are just two, as the other will be unnecessary,) we would have that each class would go to one of the anchors?

I have the intuition that the first class will be targeted to the anchor with best fit and the second class, as this anchor is already taken, will be targeted to the second anchor, but I'm not totally sure.

Is this reasoning correct?

glenn-jocher · 2024-05-02T19:52:22Z

Hey there! 😊 Your intuition is quite on point! When you have two different classes in close proximity within a grid cell, and there are suitable anchors available, YOLOv5 works as follows:

The target (object) is matched with the anchor that best fits its shape (based on IoU, Intersection over Union). So, the first class would indeed be matched to the best fitting anchor.
If there's another object (second class) close by and another suitable anchor available within the same cell, the model can assign this object to the second best fitting anchor.

It hinges on the overlap and suitability of each object to the available anchors. So yes, each class could effectively be "assigned" to a different anchor within the same grid cell if the anchors are appropriate fits for their shapes and sizes. This mechanism helps YOLOv5 detect multiple objects of different classes that are close to each other.

Keep exploring and asking great questions! 🌟

nachoogriis added the question Further information is requested label Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do Yolo target assignments to anchors work? #12978

How do Yolo target assignments to anchors work? #12978

nachoogriis commented Apr 30, 2024

glenn-jocher commented Apr 30, 2024

nachoogriis commented May 2, 2024

glenn-jocher commented May 2, 2024

How do Yolo target assignments to anchors work? #12978

How do Yolo target assignments to anchors work? #12978

Comments

nachoogriis commented Apr 30, 2024

Search before asking

Question

Additional

glenn-jocher commented Apr 30, 2024

nachoogriis commented May 2, 2024

glenn-jocher commented May 2, 2024