SSD-Object-Detection

MobileNets

MobileNets, as the name suggests, are neural networks constructed for the purpose of running very efficiently (high FPS, low memory footprint) on mobile and embedded devices. MobileNets achieve this with 3 techniques: Perform a depthwise convolution followed by a 1x1 convolution rather than a standard convolution. The 1x1 convolution is called a pointwise convolution if it's following a depthwise convolution. The combination of a depthwise convolution followed by a pointwise convolution is sometimes called a separable depthwise convolution. Use a "width multiplier" - reduces the size of the input/output channels, set to a value between 0 and 1. Use a "resolution multiplier" - reduces the size of the original input, set to a value between 0 and 1. These 3 techniques reduce the size of cummulative parameters and therefore the computation required. Of course, generally models with more paramters achieve a higher accuracy. MobileNets are no silver bullet, while they perform very well larger models will outperform them. MobileNets are designed for mobile devices, NOT cloud GPUs. The reason we're using them in this lab is automotive hardware is closer to mobile or embedded devices than beefy cloud GPUs.

MobileNet SSD

In this section you'll use a pretrained MobileNet SSD model to perform object detection. You can download the MobileNet SSD and other models from the TensorFlow detection model zoo (note: we'll provide links to specific models further below). Paper describing comparing several object detection models.

Single Shot Detection (SSD)

Many previous works in object detection involve more than one training phase. For example, the Faster-RCNN architecture first trains a Region Proposal Network (RPN) which decides which regions of the image are worth drawing a box around. RPN is then merged with a pretrained model for classification (classifies the regions). The image below is an RPN:

Faster-RCNN Visual

The SSD architecture is a single convolutional network which learns to predict bounding box locations and classify the locations in one pass. Put differently, SSD can be trained end to end while Faster-RCNN cannot. The SSD architecture consists of a base network followed by several convolutional layers:

SSD Visual

NOTE: In this lab the base network is a MobileNet (instead of VGG16.)

Result(Image)

Result(Video)

10s of driving	10sec detected result

Multi-Objects Detection Result

Traffic Light only	MultiObjects(car, person, traffic light,all )

TrafficLight Detection HQ Video	MultiObjects Detection HQ Video

Right below each bouding boxes, Classified Tag is added: (Person, Car, Traffic Light, Motorcycle , etc )

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
images		images
README.md		README.md
SSD-Object-Detection-Lab.ipynb		SSD-Object-Detection-Lab.ipynb
SSD_Objects_Detection2.ipynb		SSD_Objects_Detection2.ipynb
SSD_TrafficLight_Detection.ipynb		SSD_TrafficLight_Detection.ipynb
detection.gif		detection.gif
detection.png		detection.png
environment.yml		environment.yml
result-all.gif		result-all.gif
result-tf.gif		result-tf.gif
result.mp4		result.mp4
sample.gif		sample.gif
sample.mp4		sample.mp4
utils.py		utils.py

tooth2/SSD-Object-Detection

Folders and files

Latest commit

History

Repository files navigation

SSD-Object-Detection

MobileNets

MobileNet SSD

Single Shot Detection (SSD)

Faster-RCNN Visual

SSD Visual

Result(Image)

Result(Video)

Multi-Objects Detection Result

trained model Download

Reference

Related Work

About

Topics

Resources

Stars

Watchers

Forks

Languages