Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Own Neural Network #3

Open
SmellingSalt opened this issue Apr 16, 2021 · 8 comments
Open

Training Own Neural Network #3

SmellingSalt opened this issue Apr 16, 2021 · 8 comments

Comments

@SmellingSalt
Copy link

SmellingSalt commented Apr 16, 2021

Hello,
I am having trouble understanding the procedure to train my own detection model. I have a Jetson Nano 2GB and 4GB variant with me.
My objective is to detect if a person wears sunglasses or not. To accomplish this objective, my main queries are as follows.

  1. I will have to train a detection model on my own dataset. It is mentioned in the Custom Containers document that I need to have one that is compatible with DeepStream. If I do manage to do this, what change should I make in which codes in the docker container so that it runs this different object detection neural network?
  2. I am under the assumption that if I manage to train a custom object detection neural network following the instructions on the Deep Stream docs page, I will have a compatible neural network. I should then put these weights in a shared drive and run the container, putting the trained weights in a particular folder (which I do not know the location of) and make changes in maskcam_run.py or maskcam_inference.py to point it to the updated weights. Are there flaws in my assumptions? Could you please correct me if I am wrong? I am new to docker as well so I might be missing something fundamental.

My work flow is the exact same as mask cam, with remote deployment and web server accessing and the rest. I just need to change the object detection mechanism. Even the statistics that it provides will be unchanged.

Thank you.

@DonBraulio
Copy link
Contributor

Hi @SmellingSalt,
As you mention, your use case seems pretty compatible with what we're doing!
I'd recommend you start training a YOLOv4 model, which should work great for your use case. This way you'll only need to convert your model to a TensorRT engine as described here:
https://github.com/Tianxiaomo/pytorch-YOLOv4#51-convert-from-onnx-of-static-batch-size

Once you have the TensorRT engine file, you need to make sure it gets included in the container image. That means changing this line specifically:

RUN wget -P / https://maskcam.s3.us-east-2.amazonaws.com/maskcam_y4t_1024_608_fp16.trt

If the filename of your engine file is different from ours (most likely), change this line in the config file to point it correctly:

model-engine-file=yolo/maskcam_y4t_1024_608_fp16.trt

And finally, since your object labels differ from ours, you'll need to adapt these names in the code (from mask/no_mask to glasses/no_glasses I guess 😄 ):

LABEL_MASK = "mask"

This documentation can also guide you in this process:
https://github.com/bdtinc/maskcam/blob/main/docs/Custom-Container-Development.md

In particular, some of the steps I just mentioned above, are also here:
https://github.com/bdtinc/maskcam/blob/main/docs/Custom-Container-Development.md#how-to-use-your-own-detection-model

Good luck hacking the code and let us know if you managed to adapt the system!

@SmellingSalt
Copy link
Author

Thank you for your response!
I managed to run the docker container in interactive mode and mounted a local disk onto it as well.
By doing so, I was able to modify this line

model-engine-file=yolo/maskcam_y4t_1024_608_fp16.trt

To point to the newly trained neural network and run things.
I will take your suggested approach from now on to make it more streamlined. Thank you!

I still have issues regarding training and interfacing with the docker image provided by you.

What I did

As my intentions were to first get the process to work, not caring about performance, I first wanted to understand the procedure to train and develop a .trt model.
To do this, I followed the procedures mentioned in this face mask detection repository by Nvidia and managed to get a .trt file at the end of it.

By modifying this line

# tlt-encoded-model=detectnet_v2/resnet18_detector.etlt

to point to the newly created resnet18 file and commenting out line 105, I assumed I could get things to work.
I unfortunately faced errors and could not get it to work on my custom resnet18 model. These errors I assume are owing to the fact that there are only 2 classes (mask/no mask) trained here and you use 3 classes (mask/no mask/ not visible), therefore the .trt model is an incompatible shape.

Help I need

As suggested by you, I will look into training a YOLOv4 model and use https://github.com/Tianxiaomo/pytorch-YOLOv4#51-convert-from-onnx-of-static-batch-size to convert it to a .trt file.
Here are some things I need help with;

  1. The link suggested by you seems to convert weights in the format of .onnx to .trt. The file that you have shared here is in the format of .weights. The Nvidia documentation I mentioned earlier uses the format hdf5, which gets converted to .trt.
    There seems to be a lot of formats, but in the end, they have to be converted to a .trt file after training them.
    • As I would need a YoloV4 model, I am unsure of which format to use to train, and where to obtain the neural network from, so could you explicitly state which format and from where I should obtain this?
  2. Having read and understood your experience on training these neural networks, you have mentioned that you needed another class called not visible to help track people across the screen and not trigger false counts.
    I am also in need of this feature and as you have rightly mentioned that there are no labelled datasets for this class, I am under the assumption that I have to use transfer learning and retrain your weights to learn the glasses/no glasses classes. I would like some clarification on whether I am right or wrong, and any further inputs you have, to know how to handle this situation.
  3. Any repository/blogpost/resources that you have or suggest I follow to train a YOLOv4 model would be highly appreciated as well.

Thank you for your time!

@DonBraulio
Copy link
Contributor

Hi @SmellingSalt! I'm glad you're working on this.
I'll go over your questions:

  1. The .weights file I provided in the other issue, is generated by training with the original darknet implementation of Yolov4 (see point 3). To convert this file, you also have instructions in the repo I linked, but in this other section. After you've ONNX, go to the ONNX->TensorRT part that follows.
  2. Your understanding about the not visible class to improve tracking is correct. However, the latest version of our model isn't that good to detect that class precisely, because we merged a huge dataset that had much more examples of mask/no_mask. So, you've a couple options: not to use that class at first (I think the code should work anyway, otherwise you might add the label to YOLO even though you haven't actual samples on the dataset). Then you might add some examples of not_visible faces/heads to your dataset (I'm guessing that you might have some already without proper labeling, otherwise see answer to (3)). The other option is that you remove that class from the code entirely, if you don't think it's useful. One more thing to clarify is that we actually have 4 classes in the detector: mask/no_mask/not_visible/misplaced (ignore the misplaced entirely in your case, but that's why you'll see number of classes = 4 in some places of code or configuration files).
  3. The AlexeyAB/darknet repository is a great starting point to learn about training and configuring YOLO.
    Also, related to point (2), if you're short of examples of not_visible faces, you might want to add some images from a "head detection" dataset like this one

@SmellingSalt
Copy link
Author

Hello,
Thank you for your inputs.
I have managed to train a YOLOV4 tiny neural network following the instructions provided in the links you shared.
I have the .weights and the .onnx files ready. I am now stuck at the penultimate step.

I followed the procedure outlined here and managed to generate a .engine file, but not the .trt file unfortunately. Earlier, when I had trained the resnet18 model following the steps here, the .trt file was automatically created.

On searching the internet for answers, I have come to the understanding that the trtexec command that is used here produces .engine files. I could find this file present in the same directory as where the .weights file was stored.

So in order to get the .trt model to replace here

model-engine-file=yolo/maskcam_y4t_1024_608_fp16.trt

what was the procedure that you followed?
Or am I making a mistake somewhere else?

Also as a side note, I came across this answer on stack exchange, which claims the following

The downside of TRT is that the conversion to TRT needs to be done on the target device

And from the official documentation of NVIDIA's tensorrt, I came across this line

Note: Serialized engines are not portable across platforms or TensorRT versions. Engines are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version)

So I would like to know if you did the conversion from .weights to .trt on the Jetson Nano, or on another device. I have so far been trying all of this on my laptop, and only ran your maskcam docker image on the Jetson Nano.

@DonBraulio
Copy link
Contributor

Hey there,
I believe that you can just use the .engine file, or even change the extension to .trt manually if you prefer that. If you produced that with the trtexec command, then you've a TensorRT file there, regardless of the extension.
About running the conversion command trtexec: yes, you need to run that last step (ONNX->TensorRT) on the device and the container itself.
TensorRT is quite picky so you need to match platforms and even versions. If the command is not found inside the container, you might install it using APT (I don't remember the exact name now but you can use apt update && apt search tensorrt).

@SmellingSalt
Copy link
Author

Hello,
Thank you for your inputs. I have managed to successfully deploy my own neural network on the container, following them!

The only issue left now is with the class labels.

I get the following errors

  • WARNING: Num classes mismatch. Configured:2, detected by network: 4 printed repeatedly.
    • This warning doesn't make sense as I have trained the model with only two classes and set the same info in the .cfg file as well. I did start with your weights but I don't think that should affect it, however.
  • The camera feed outputs classes as not_visible and no_mask.

Changes I have made

File maskcam_inference.py

I have changed

LABEL_MASK = "mask"
LABEL_NO_MASK = "no_mask"  # YOLOv4: no_mask
LABEL_MISPLACED = "misplaced"
LABEL_NOT_VISIBLE = "not_visible"

To

LABEL_MASK = "glasses"
LABEL_NO_MASK = "no_glasses"  # YOLOv4: no_mask
LABEL_MISPLACED = "misplaced"
LABEL_NOT_VISIBLE = "not_visible"

File maskcam_config.txt

Change 1

num-detected-classes=4

to

num-detected-classes=2

Change 2

python
model-engine-file=yolo/maskcam_y4t_1024_608_fp16.trt

to

python
model-engine-file=yolo/yolov4_1_3_416_416_static.trt

Change 3

labelfile-path=yolo/data/obj.names

To point to the new obj.names file which has only 2 names in it, glasses and no glasses

File 3 .cfg files

I have replaced the /yolo/facemask-yolov4-tiny.cfg file with my own.

Could you let me know where else am I supposed to make the required changes?
Thank you!

@Raphenri09
Copy link

Hi @SmellingSalt and @DonBraulio,
I am trying to do something similar. I am currently at the step where I successfully trained my custom yolov4-tiny model from darknet. I did convert my model to tensort based on the same repository proposed by the maskcam team. I know that my tensort model is functional since I tested it with the demo_trt.py from the same repository. Now I am trying to use my yolov4-tiny.trt within the maskcam container but I get error message. From my understanding I need to do the conversion from onnx-tensort within the container since my tensort version on my jetson nano is not the same as the one in the container. My question for you is how did you proceed to this conversion inside the container.
Thanks for your help
Raphael

@DonBraulio
Copy link
Contributor

Hey @SmellingSalt, I'm really sorry I missed your message in the noise. I hope you could find the solution to your problem.

If not, I think that probably the thing you were missing was changing classes=4 to 2 in the yolo config file (here and on line 274), unfortunately it seems like the darknet implementation doesn't pick up the number of classes from the labels file (check the original docs for more info).

@Raphenri09 you should start your container in Development Mode and then run trtexec from the bash prompt inside the container.
I can't test right now if tensorrt is pre-installed in the container, but I've checked some notes and found that I was using something around this command to do the conversion:

/usr/src/tensorrt/bin/trtexec --onnx=../yolo/yolov4_1_3_608_608_static.onnx --explicitBatch --saveEngine=tensorrt_fp16.trt --fp16

Of course, you'll need to change the file names and check the path to that executable, but I hope it helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants