Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference model after convert to tflite file. #12988

Open
1 task done
sangyo1 opened this issue May 7, 2024 · 18 comments
Open
1 task done

Inference model after convert to tflite file. #12988

sangyo1 opened this issue May 7, 2024 · 18 comments
Labels
question Further information is requested

Comments

@sangyo1
Copy link

sangyo1 commented May 7, 2024

Search before asking

Question

After converting the YOLOv5 model to a TFLite model using export.py, I am attempting to use it for object detection. However, I need to understand how to draw the bounding boxes and what the input and output formats are for this TFLite model. I'm currently facing issues with incorrect bounding box placement or errors in my object detection code. Here's the code I'm using to load the image and perform object detection, but the outcomes are not correct:

FYI, this is how I converted the model
python3 export.py --weights /home/ubuntu/ssl/yolov5/runs/train/exp9/weights/best.pt --include tflite

And this is my tensor input and output

Input Details: [{'name': 'serving_default_input_1:0', 'index': 0, 'shape': array([  1, 640, 640,   3], dtype=int32), 'shape_signature': array([  1, 640, 640,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Output Details: [{'name': 'StatefulPartitionedCall:0', 'index': 532, 'shape': array([    1, 25200,    10], dtype=int32), 'shape_signature': array([    1, 25200,    10], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Additional

def preprocess_image(image_path, input_size, input_mean=127.5, input_std=127.5):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image_resized = cv2.resize(image_rgb, (input_size, input_size))
    input_data = (np.float32(image_resized) - input_mean) / input_std
    return image, np.expand_dims(input_data, axis=0)

def detect_objects(interpreter, image_path, threshold=0.25):
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    _, input_size = input_details[0]['shape'][1], input_details[0]['shape'][2]
    
    image, input_data = preprocess_image(image_path, input_size)
    
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()

    boxes = interpreter.get_tensor(output_details[1]['index'])[0]  # Bounding box coordinates of detected objects
    classes = interpreter.get_tensor(output_details[3]['index'])[0]  # Class index of detected objects
    scores = interpreter.get_tensor(output_details[0]['index'])[0]  # Confidence scores

    detections = []
    # Ensure scores are treated as an array for safe iteration
    scores = np.squeeze(scores)
    classes = np.squeeze(classes)
    boxes = np.squeeze(boxes)
    
    for i in range(len(scores)):
        if scores[i] > threshold:
            ymin, xmin, ymax, xmax = boxes[i]
            # Ensure coordinates are scaled back to original image size
            imH, imW, _ = image.shape
            xmin = int(max(1, xmin * imW))
            xmax = int(min(imW, xmax * imW))
            ymin = int(max(1, ymin * imH))
            ymax = int(min(imH, ymax * imH))
            class_id = int(classes[i])
            category_id_mapping = load_category_mapping(labels_path)

            # Using category_id_mapping to find the category ID
            category_id = category_id_mapping.get(class_id, class_id) + 1  # Fallback to class_id + 1 if not found

            detections.append({
                "class_id": class_id,
                "category_id": category_id,
                'bbox': [xmin, ymin, xmax, ymax],
                'segmentation': [xmin, ymin, (xmax - xmin), (ymax - ymin)],
                "area": (xmax - xmin) * (ymax - ymin),
                "score": float(scores[i])
            })

    return detections

Here is the error I get

Traceback (most recent call last):
  File "/home/sangyoon/dCentralizedSystems/machine-learning/tensorflow/python/testing/thermal/thermal_test.py", line 562, in <module>
    main()
  File "/home/sangyoon/dCentralizedSystems/machine-learning/tensorflow/python/testing/thermal/thermal_test.py", line 412, in main
    detections = detect_objects(interpreter, image_path, threshold=0.25)
  File "/home/sangyoon/dCentralizedSystems/machine-learning/tensorflow/python/testing/thermal/thermal_test.py", line 109, in detect_objects
    boxes = interpreter.get_tensor(output_details[1]['index'])[0]  # Bounding box coordinates of detected objects
IndexError: list index out of range

@sangyo1 sangyo1 added the question Further information is requested label May 7, 2024
Copy link
Contributor

github-actions bot commented May 7, 2024

👋 Hello @sangyo1, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

@glenn-jocher
Copy link
Member

@sangyo1 hey there!

It looks like you're having trouble with indexing the outputs of your TFLite model. The error IndexError: list index out of range suggests that the output_details array might not have the structure you were expecting.

The key here is to ensure that you're accessing the correct indices for your boxes, classes, and scores. From your description, it seems these might be jumbled or misplaced.

Here's a practical step to debug this issue:

  1. Check the Outputs: Before trying to index into output_details, print out the entire output_details to verify the indices and contents. Sometimes, the model conversion can rearrange these arrays.

    # Print output details to verify output tensor indices
    print(output_details)

After verifying the correct indices for each output tensor, update the indices in your detect_objects function accordingly.

This should help resolve the IndexError by ensuring you are referring to the existing indices in your TFLite outputs. Also, double-check the logic for looping through scores and boxes to make sure it aligns with the structure of your specific output details. For further guidance, the YOLOv5 documentation (find it in our docs section) might provide some additional insights into typical output configurations for various model exports. 🚀

@sangyo1
Copy link
Author

sangyo1 commented May 7, 2024

output_details

Thank you @glenn-jocher
I post the output_detail and input but here you go again.

[{'name': 'StatefulPartitionedCall:0', 
'index': 532, 'shape': array([    1, 25200,    10], dtype=int32), 
'shape_signature': array([    1, 25200,    10], dtype=int32), 
'dtype': <class 'numpy.float32'>, 
'quantization': (0.0, 0), 
'quantization_parameters': {'scales': array([], dtype=float32), 
'zero_points': array([], dtype=int32), 
'quantized_dimension': 0}, 
'sparsity_parameters': {}}]

Also here is Input:

[{'name': 'serving_default_input_1:0', 
'index': 0, 'shape': array([  1, 640, 640,   3], dtype=int32), 
'shape_signature': array([  1, 640, 640,   3], dtype=int32), 
'dtype': <class 'numpy.float32'>, 
'quantization': (0.0, 0), 
'quantization_parameters': {'scales': array([], dtype=float32), 
'zero_points': array([], dtype=int32), 
'quantized_dimension': 0}, 
'sparsity_parameters': {}}]

I was go over #11395 (comment)
However, the output is little different so even I use same code it didn't fix the problem
The issue's output is [1,25200,6] but mine is [1,25200,10]

@sangyo1
Copy link
Author

sangyo1 commented May 7, 2024

I have another question regarding the conversion to TFLite. I suspect the labels (classes) might have gotten mixed up. Originally, I trained the model with 6 classes, but when I run the command:

python3 detect.py --weights /home/sangyoon/Downloads/best_fp16.tflite --source /home/sangyoon/Desktop/image/hotspot_rgb/test

it incorrectly tags objects as motorcycles, airplanes, etc., which are not related to my training labels.

@glenn-jocher
Copy link
Member

Hey @sangyo1!

It sounds like there might be a mix-up with the class labels recognition when running inference with the converted TFLite model. Here’s a quick check and a few tips:

  • Check Label Mapping: Ensure that the class labels used during training are correctly mapped in your detection script. If a labeling mismatch occurs, it may incorrectly assign predictions.

  • Model Output Review: Verify if the trained model's output corresponds to your classes securely. Sometimes conversion processes might shift things differently, especially with indices.

  • Utility Check: Use --img 640 --conf 0.25 while running the detect.py for ensuring that the model uses the same image size and confidence threshold as was used during training.

Fixing these should align the predictions more accurately with your original training classes.

@sangyo1
Copy link
Author

sangyo1 commented May 8, 2024

Thank you for your response, @glenn-jocher. When I run:

python3 detect.py --weights /home/sangyoon/Downloads/best.pt --source /home/sangyoon/Desktop/image/hotspot_rgb/test

everything works perfectly. However, I encounter issues with the best_fp16.tflite file where it mislabels objects. I'm curious if the command:

python3 export.py --weights /home/ubuntu/ssl/yolov5/runs/train/exp9/weights/best.pt --include tflite

requires a specific label map to convert to a TFLite model.

Additionally, regarding output_details mentioned earlier, how can I identify which outputs correspond to bounding boxes, labels, and confidence scores? How should I go about creating my own inference code to detect objects in images? Is it feasible to modify detect.py to craft my own script?

Ultimately, I'd like to automatically save images with specific detected objects to a different directory. How can I implement this?

Any tips to update this function?

def preprocess_image(image_path, input_size, input_mean=127.5, input_std=127.5):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image_resized = cv2.resize(image_rgb, (input_size, input_size))
    input_data = (np.float32(image_resized) - input_mean) / input_std
    return image, np.expand_dims(input_data, axis=0)

def detect_objects(interpreter, image_path, threshold=0.25):
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    _, input_size = input_details[0]['shape'][1], input_details[0]['shape'][2]
    
    image, input_data = preprocess_image(image_path, input_size)
    
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()

    boxes = interpreter.get_tensor(output_details[1]['index'])[0]  # Bounding box coordinates of detected objects
    classes = interpreter.get_tensor(output_details[3]['index'])[0]  # Class index of detected objects
    scores = interpreter.get_tensor(output_details[0]['index'])[0]  # Confidence scores

    detections = []
    # Ensure scores are treated as an array for safe iteration
    scores = np.squeeze(scores)
    classes = np.squeeze(classes)
    boxes = np.squeeze(boxes)
    
    for i in range(len(scores)):
        if scores[i] > threshold:
            ymin, xmin, ymax, xmax = boxes[i]
            # Ensure coordinates are scaled back to original image size
            imH, imW, _ = image.shape
            xmin = int(max(1, xmin * imW))
            xmax = int(min(imW, xmax * imW))
            ymin = int(max(1, ymin * imH))
            ymax = int(min(imH, ymax * imH))
            class_id = int(classes[i])
            category_id_mapping = load_category_mapping(labels_path)

            # Using category_id_mapping to find the category ID
            category_id = category_id_mapping.get(class_id, class_id) + 1  # Fallback to class_id + 1 if not found

            detections.append({
                "class_id": class_id,
                "category_id": category_id,
                'bbox': [xmin, ymin, xmax, ymax],
                'segmentation': [xmin, ymin, (xmax - xmin), (ymax - ymin)],
                "area": (xmax - xmin) * (ymax - ymin),
                "score": float(scores[i])
            })

    return detections

How do I bring right boxes, classes, and scores with yolov5 tflite model?

@sangyo1
Copy link
Author

sangyo1 commented May 8, 2024

According to above question,
Here is my comparison image, detect.py vs. my own inference code.

image

image

As you can see my own inference code is way off and I reference the code from #1981 (comment)

And here is my code I am not sure where did I make a mistake

def preprocess_image(image_path, input_size, input_mean=127.5, input_std=127.5):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image_resized = cv2.resize(image_rgb, (input_size, input_size))
    input_data = (np.float32(image_resized) - input_mean) / input_std
    return image, np.expand_dims(input_data, axis=0)

def detect_objects(interpreter, image_path, threshold=0.25):
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    _, input_size = input_details[0]['shape'][1], input_details[0]['shape'][2]
    
    image, input_data = preprocess_image(image_path, input_size)
    
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()

    output_data = interpreter.get_tensor(output_details[0]['index'])[0] # Assuming batch size is 1
    xywh = output_data[..., :4]
    conf = output_data[..., 4:5]
    cls = tf.reshape(tf.cast(tf.argmax(output_data[..., 5:], axis=1), tf.float32), (-1,1))
    output = np.squeeze(tf.concat([conf, cls, xywh], 1))
    
    scores = output[..., 0]
    classes = output[..., 1]
    boxes = output[..., 2:]
    x, y, w, h = boxes[..., 0], boxes[..., 1], boxes[..., 2], boxes[..., 3]
    xyxy = [x - w / 2, y - h / 2, x + w / 2, y + h / 2]  # xywh to xyxy   [25200, 4]

    for i in range(len(scores)):
        if ((scores[i] > 0.1) and (scores[i] <= 1.0)):
            H = image.shape[0]
            W = image.shape[1]
            xmin = int(max(1,(xyxy[0][i] * W)))
            ymin = int(max(1,(xyxy[1][i] * H)))
            xmax = int(min(H,(xyxy[2][i] * W)))
            ymax = int(min(W,(xyxy[3][i] * H)))

            cv2.rectangle(image, (xmin,ymin), (xmax,ymax), (10, 255, 0), 2)
            #cv2.putText(image, classes, (int(xmin), int(ymin - 10)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
    return image

def main():
    model_path = '../Downloads/best_fp16.tflite'
    image_path = '../test/1713978614000000.colorleftthumb.jpeg'

    interpreter = tf.lite.Interpreter(model_path=model_path)
    interpreter.allocate_tensors()

    image_with_boxes = detect_objects(interpreter, image_path)
    output_image_path = '/home/sangyoon/Desktop/image/output_with_boxes.jpg'
    cv2.imwrite(output_image_path, image_with_boxes)
    print(f"Image with bounding boxes saved to {output_image_path}")

if __name__ == '__main__':
    main()

@sangyo1
Copy link
Author

sangyo1 commented May 8, 2024

sorry keep adding more more questions, but here is the code that I change my pre-process image function.
As you can see it create more crazier boxes. I am curious preprocess_image function or detect_object function cause the problem

Here is the changed preprocess_image function

def preprocess_image(image_path, input_size, input_mean=127.5, input_std=127.5):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image_resized = cv2.resize(image_rgb, (input_size, input_size))
    input_data = (np.float32(image_resized) - input_mean) / input_std
    image_data = Image.open(image_path)
    image_data = image_data.resize((640, 640))
    image_data = np.array(image_data).astype(np.float32)
    image_data = np.expand_dims(image_data, axis=0)
    
    return image, image_data

and here is the image that corresponding to the code
image

@glenn-jocher
Copy link
Member

Hey @sangyo1,

No worries about the questions, happy to help! Looking at your preprocess_image function, it appears you are processing the image data twice with two different methods. This could certainly cause the inconsistency in the bounding box placements.

  1. You initially process the image with resizing and normalization (using mean and std), convert it to RGB, and then a similar resize operation is applied to the image_data but without normalization.
  2. It's essential to ensure that the resizing and normalization done during preprocessing matches exactly with the format expected by your model.

Here’s a streamlined version of your preprocess_image function:

def preprocess_image(image_path, input_size, input_mean=127.5, input_std=127.5):
    image = Image.open(image_path).convert('RGB')
    image = image.resize((input_size, input_size))
    image_data = np.array(image).astype(np.float32)
    image_data = (image_data - input_mean) / input_std
    image_data = np.expand_dims(image_data, axis=0)
    
    return image, image_data

This ensures consistency in image preparation for your model. Try running your detection with this and check if it resolves the issue with the bounding boxes! 😊

@sangyo1
Copy link
Author

sangyo1 commented May 8, 2024

@glenn-jocher,
I updated it and it is the same boxes as #12988 (comment)
As you can see my inference.py bounding box is way off than detect.py result.
Is it because my image is 1920x1080? but from the code I resize the image, and I think that is not the problem. I am not sure why it gave me the different result then detect.py. Here is my snap code

def detect_objects(interpreter, image_path, threshold=0.25):
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    _, input_size = input_details[0]['shape'][1], input_details[0]['shape'][2]
    
    image, input_data = preprocess_image(image_path, input_size)
    
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()

    output_data = interpreter.get_tensor(output_details[0]['index'])[0] # Assuming batch size is 1
    xywh = output_data[..., :4]
    conf = output_data[..., 4:5]
    cls = tf.reshape(tf.cast(tf.argmax(output_data[..., 5:], axis=1), tf.float32), (-1,1))
    output = np.squeeze(tf.concat([conf, cls, xywh], 1))
    
    scores = output[..., 0]
    classes = output[..., 1]
    boxes = output[..., 2:]
    x, y, w, h = boxes[..., 0], boxes[..., 1], boxes[..., 2], boxes[..., 3]
    xyxy = [x - w / 2, y - h / 2, x + w / 2, y + h / 2]  # xywh to xyxy   [25200, 4]

    for i in range(len(scores)):
        if ((scores[i] > 0.1) and (scores[i] <= 1.0)):
            H = image.shape[0]
            W = image.shape[1]
            xmin = int(max(1,(xyxy[0][i] * W)))
            ymin = int(max(1,(xyxy[1][i] * H)))
            xmax = int(min(H,(xyxy[2][i] * W)))
            ymax = int(min(W,(xyxy[3][i] * H)))

            cv2.rectangle(image, (xmin,ymin), (xmax,ymax), (10, 255, 0), 2)
            #cv2.putText(image, classes, (int(xmin), int(ymin - 10)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
    return image

is it correct way to create boxes?
Also here is my output_details

[{'name': 'StatefulPartitionedCall:0', 
'index': 532, 'shape': array([    1, 25200,    10], dtype=int32), 
'shape_signature': array([    1, 25200,    10], dtype=int32), 
'dtype': <class 'numpy.float32'>, 
'quantization': (0.0, 0), 
'quantization_parameters': {'scales': array([], dtype=float32), 
'zero_points': array([], dtype=int32), 
'quantized_dimension': 0}, 
'sparsity_parameters': {}}]

@glenn-jocher
Copy link
Member

Hey @sangyo1!

It looks like your approach to drawing the bounding boxes and preprocessing might be generally correct. However, issues could arise from how the box coordinates are being recalculated and represented.

Since the YOLO model outputs coordinates in [x_center, y_center, width, height] relative to the image's dimensions, converting these to corner coordinates (xmin, ymin, xmax, ymax) should follow this mapping:

xmin = int(max(1, (x - w/2) * W))
xmax = int(min(W, (x + w/2) * W))
ymin = int(max(1, (y - h/2) * H))
ymax = int(min(H, (y + h/2) * H))

Ensure the image size matches the dimension you're visualizing the outputs on, especially after resizing. A good check would be to confirm that the aspect ratio is maintained or adjust accordingly to see if the bounding boxes improve.

For better troubleshooting, recheck your preprocessing and ensure that image aspect ratios are handled properly during resize operations. Correct preprocessing is often critical in ensuring model outputs align well on the input image. 😊

@sangyo1
Copy link
Author

sangyo1 commented May 9, 2024

Hello @glenn-jocher

def preprocess_image(image_path, input_size, input_mean=127.5, input_std=127.5):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image_resized = cv2.resize(image_rgb, (input_size, input_size))
    image_data = np.array(image_resized).astype(np.float32)
    image_data = (image_data - input_mean) / input_std
    image_data = np.expand_dims(image_data, axis=0)
    
    return image, image_data

def detect_objects(interpreter, image_path, threshold=0.25):
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    _, input_size = input_details[0]['shape'][1], input_details[0]['shape'][2]
    
    image, input_data = preprocess_image(image_path, input_size)
    
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()

    output_data = interpreter.get_tensor(output_details[0]['index'])[0] # Assuming batch size is 1
    xywh = output_data[..., :4]
    conf = output_data[..., 4:5]
    cls = tf.reshape(tf.cast(tf.argmax(output_data[..., 5:], axis=1), tf.float32), (-1,1))
    output = np.squeeze(tf.concat([conf, cls, xywh], 1))
    
    scores = output[..., 0]
    classes = output[..., 1]
    boxes = output[..., 2:]
    # boxes = np.squeeze(output_data[..., :4])    # boxes  [25200, 4]
    # scores = np.squeeze( output_data[..., 4:5])
    x, y, w, h = boxes[..., 0], boxes[..., 1], boxes[..., 2], boxes[..., 3]
    xyxy = [x - w / 2, y - h / 2, x + w / 2, y + h / 2]  # xywh to xyxy   [25200, 4]
    result_image = np.array(image)
    H, W = result_image.shape[:2]
    for i in range(len(scores)):
        if ((scores[i] > 0.25) and (scores[i] <= 1.0)):
            H = image.shape[0]
            W = image.shape[1]
            xmin = int(np.maximum(1, (x[i] - w[i]/2) * W))
            xmax = int(np.minimum(W, (x[i] + w[i]/2) * W))
            ymin = int(np.maximum(1, (y[i] - h[i]/2) * H))
            ymax = int(np.minimum(H, (y[i] + h[i]/2) * H))

            cv2.rectangle(image, (xmin, ymin), (xmax, ymax), (10, 255, 0), 2)
    return image

def main():
    model_path = '../best_fp16.tflite'
    image_path = '../navigation-topological-path-planners-default/nee-cavendish-cbx-rover/W85/W82:W83/1713979758000000.colorleftthumb.jpeg'

    interpreter = tf.lite.Interpreter(model_path=model_path)
    interpreter.allocate_tensors()

    image_with_boxes = detect_objects(interpreter, image_path)
    output_image_path = '/home/sangyoon/Desktop/image/output_with_boxes.jpg'
    cv2.imwrite(output_image_path, image_with_boxes)
    print(f"Image with bounding boxes saved to {output_image_path}")

if __name__ == '__main__':
    main()

I update my detect_object based on your recommendation
I could not use

xmin = int(max(1, (x - w/2) * W))
xmax = int(min(W, (x + w/2) * W))
ymin = int(max(1, (y - h/2) * H))
ymax = int(min(H, (y + h/2) * H))

since it occurred error, but above code I believe it does work same thing. however, I still get same image same bound boxes, I am not sure what did I missed. So I actually upload full code.

Or as I mentioned before, is it possible to use detect.py in my inference python code?

like run detect.py, and if it detect specific object, it save the image in save_directory something like that.
Also for tflite model how do I apply agnostic-nms?

@glenn-jocher
Copy link
Member

Hello @sangyo1!

Thanks for sharing your updated code. It seems like you've adjusted the bounding box calculations correctly. If the bounding boxes still appear off, you might want to validate:

  1. Model Outputs: Ensure the outputs (box coordinates) from the model align with your expectations post-inference. It helps to log or inspect them directly.
  2. Image Dimensions: Double-check if your images' scaling during preprocessing aligns correctly with how the model was trained. This affects how the model interprets and outputs coordinates.

About integrating detect.py: You can invoke it directly from another Python script using subprocess (if you prefer running it as a command) or by importing and calling the required functions directly if you adapt the detect.py script into a callable function within your project.

For applying agnostic-nms with tflite models, you would typically implement it as you would with a regular model output:

  • Extract confidence scores, classes, and boxes.
  • Apply the NMS algorithm, which can be custom implemented or sourced from libraries that support TFLite output structure.

We don't have out-of-box support for using agnostic-nms on TFLite in the YOLOv5 repository; you'll need to adapt it from existing NMS implementations suitable for TFLite outputs.

I hope this helps! 😊

@sangyo1
Copy link
Author

sangyo1 commented May 9, 2024

@glenn-jocher
Thank you so much, I change my code, just use the detect.py into my code

import torch
import numpy as np
import cv2
import os

def load_model(model_path):
    # Load the YOLOv5 model from a local directory
    model = torch.hub.load('/home/sangyoon/workspace/yolov5', 'custom', path=model_path, source='local')
    model.eval()
    return model

def detect_objects(model, image):
    # Perform inference with model
    results = model(image, size=640)
    return results

def find_and_process_images(source_dir, target_dir, base_name, model, target_ids):
    # Search through the directory and process images
    for root, dirs, files in os.walk(source_dir):
        for file in files:
            if base_name in file and file.endswith('.colorleftthumb.jpeg'):
                try:
                    image_path = os.path.join(root, file)
                    image = cv2.imread(image_path)[..., ::-1]
                    if image is None:
                        continue
                    results = detect_objects(model, image)
                    detected_ids = {int(x[-1]) for x in results.xyxy[0]}  # Get detected class IDs from results
                    if detected_ids.intersection(target_ids):  # Check if any target class is detected
                        processed_image = results.render()[0]
                        target_file_path = os.path.join(target_dir, f"annotated_{file}")
                        processed_image_bgr = cv2.cvtColor(processed_image, cv2.COLOR_RGB2BGR)
                        cv2.imwrite(target_file_path, processed_image_bgr)
                        print(f"Processed and saved annotated image to {target_file_path}")
                except Exception as e:
                    print(f"Failed to process {file}: {str(e)}")

def main(model_path, source_dir, target_dir, base_name):
    # Load model and process images
    model = load_model(model_path)
    # Define target classes and their corresponding IDs
    target_classes = ['cbx-cable', 'cbx-box', 'homerun-cable']
    label_to_id = {name: idx for idx, name in enumerate(model.names)}  # Assuming model.names holds class names
    target_ids = {label_to_id[label] for label in target_classes if label in label_to_id}
    find_and_process_images(source_dir, target_dir, base_name, model, target_ids)

I have a question for you. I'm trying to save images that contain specific target classes, but how can I achieve this? Previously, my code saved all images that detected any object. Now, I want to save only those images where specific objects are detected, as I've added some classes to reduce false positives. How can I configure this in the code?

@sangyo1
Copy link
Author

sangyo1 commented May 9, 2024

Sorry actually solve the problem above code, but I have new question that how do I set the score threshold in that code? I want to set detect anything above confidence interval = .5

@glenn-jocher
Copy link
Member

Hello!

Great to hear that you solved your previous issue! To set the confidence threshold for detections to 0.5 in YOLOv5, you can adjust the conf_thres parameter in the model function call in your detect_objects function. Here’s how you can modify it:

def detect_objects(model, image):
    # Perform inference with model setting confidence threshold to 0.5
    results = model(image, size=640, conf_thres=0.5)
    return results

This will ensure that your model only considers detections with a confidence score above 0.5. Happy coding! 😊

@sangyo1
Copy link
Author

sangyo1 commented May 16, 2024

Hey @glenn-jocher thnk you so much that really helped.

I have one last question: how can I apply detection specifically to the center of the image? For instance, I want to focus the detection on only the central 50% of the image. How can I implement this?

@glenn-jocher
Copy link
Member

@sangyo1 hello!

I'm glad to hear the previous advice was helpful! To focus detection on the central 50% of an image, you can modify the image before feeding it into the model. Here’s an example of how you might crop the image in Python using OpenCV:

import cv2

def crop_center(image):
    h, w = image.shape[:2]
    start_x = w // 4
    end_x = start_x + (w // 2)
    start_y = h // 4
    end_y = start_y + (h // 2)
    cropped_image = image[start_y:end_y, start_x:end_x]
    return cropped_image

# Usage
image = cv2.imread('path_to_your_image.jpg')
cropped_image = crop_center(image)
# Now pass 'cropped_image' to the detection model

This will crop to the central 50% of the image, and you can then pass this cropped portion to your YOLOv5 model for detection. Happy coding! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants