Weird inference time for grounding_dino with vit_h and vit_tiny #470

stupidyoh · 2024-03-12T08:47:30Z

Hello! Thank you for your great work.

Recently, I tested several given code like "grounded_light_hqsam" and "grounded_sam_simple_demo".
And there is some weird results for following code.

(First part)
detections = grounding_dino_model.predict_with_classes(
image=image,
classes=CLASSES,
box_threshold=BOX_THRESHOLD,
text_threshold=BOX_THRESHOLD
)

(Second part)
detections.mask = segment(
sam_predictor=sam_predictor,
image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB),
xyxy=detections.xyxy
)

For grounded_light_hqsam using "vit_h" for sam encoder, first part takes 1.574 second and second part takes 0.611 second.
And for grounded_sam_simple_demo using "vit_tiny", first part takes 2.177 second and second part takes 0.136 second.

In my opinion, the shorter time for second part is okay because vit_tiny is light model.
But I have no idea why the first part takes more time for vit_tiny.

I want to use these model in real-time, so I want it to take a shorter time.
I would appreciate it if you could give me some advice on why this result came out and how to shorten the time.

Thank you!

stupidyoh · 2024-03-12T11:28:05Z

I'm sorry.
It takes different time for every single test.
But the deviation is larger than I thought.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird inference time for grounding_dino with vit_h and vit_tiny #470

Weird inference time for grounding_dino with vit_h and vit_tiny #470

stupidyoh commented Mar 12, 2024 •

edited

stupidyoh commented Mar 12, 2024

Weird inference time for grounding_dino with vit_h and vit_tiny #470

Weird inference time for grounding_dino with vit_h and vit_tiny #470

Comments

stupidyoh commented Mar 12, 2024 • edited

stupidyoh commented Mar 12, 2024

stupidyoh commented Mar 12, 2024 •

edited