You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm studying your code on GitHub and came across a question that I can't fully grasp. I'm curious why predictions on the same image during training and inference might differ. It would be great if you could provide more details about the differences in operation during training and inference, especially in the context of the functions decoder_forward_dynamic and decoder_forward in the decoder, as well as points_queris_embed and points_queris_embed in BasePETCount.
I would appreciate your clarification!
Best regards,
Konstantin
The text was updated successfully, but these errors were encountered:
During training, we generate the whole point-query quadtree, because we need to compute loss to supervise it. During testing, we dynamically construct the point-query quadtree, i.e., using sparse/dense point queries in sparse/dense regions. This operation aims to accelerate inference speed. Technically, one can use the same function in training to do inference.
To be more specific, we use the split map (Figure 4 in the paper) to categorize sparse and dense regions, where sparse/dense point queries are responsible for object prediction in sparse/dense regions.
Regarding one can use the same function in training to do inference, I mean one can use sparse/dense point queries to do inference in the whole image, and use the split map to select the corresponding predictions in sparse and dense regions. This operation is relatively computationally expensive.
A more convenient way, which is presented in this repo, is to dynamically construct the point-query quadtree to do inference. This ensures that sparse/dense point queries only do inference in sparse/dense regions.
Hello!
I'm studying your code on GitHub and came across a question that I can't fully grasp. I'm curious why predictions on the same image during training and inference might differ. It would be great if you could provide more details about the differences in operation during training and inference, especially in the context of the functions decoder_forward_dynamic and decoder_forward in the decoder, as well as points_queris_embed and points_queris_embed in BasePETCount.
I would appreciate your clarification!
Best regards,
Konstantin
The text was updated successfully, but these errors were encountered: