Autonomous driving - Car detection

本次编程作业主要基于吴恩达的深度学习课程（4.3 目标检测）以及以下两篇论文：Redmon et al., 2016 以及 Redmon and Farhadi, 2016。

1 问题描述

Build a car detection system
There are images gathered that have been labelled by drawing bounding boxes around every car you found.
If you have 80 classes that you want YOLO to recognize, you can represent the class label $c$ either as an integer from 1 to 80, or as an 80-dimensional vector (with 80 numbers) one component of which is 1 and the rest of which are 0.

2 YOLO

YOLO (“you only look once”)是一个具有高准确度并可实时运行的算法。

2.1 Model details

The input is a batch of images of shape (m, 608, 608, 3)
The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers $(p_c, b_x, b_y, b_h, b_w, c)$ as explained above. If you expand $c$ into an 80-dimensional vector, each bounding box is then represented by 85 numbers.

We will use 5 anchor boxes. So you can think of the YOLO architecture as the following: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).

Encoding architecture for YOLO

若物体中的的中心落入到网格当中，这个网格则负责把该物体检测出来。

For each box (of each cell) we will compute the following elementwise product and extract a probability that the box contains a certain class.
Find the class detected by each box

Here’s one way to visualize what YOLO is predicting on an image:

For each of the 19x19 grid cells, find the maximum of the probability scores (taking a max across both the 5 anchor boxes and across different classes).
Color that grid cell according to what object that grid cell considers the most likely.

Doing this results in this picture:

**Figure 5** : Each of the 19x19 grid cells colored according to which class has the largest predicted probability in that cell.

Note that this visualization isn’t a core part of the YOLO algorithm itself for making predictions; it’s just a nice way of visualizing an intermediate result of the algorithm.

Another way to visualize YOLO’s output is to plot the bounding boxes that it outputs. Doing that results in a visualization like this:

Each cell gives you 5 boxes. In total, the model predicts: 19x19x5 = 1805 boxes just by looking once at the image (one forward pass through the network)! Different colors denote different classes.

In the figure above, we plotted only boxes that the model had assigned a high probability to, but this is still too many boxes. You’d like to filter the algorithm’s output down to a much smaller number of detected objects. To do so, you’ll use non-max suppression. Specifically, you’ll carry out these steps:

Get rid of boxes with a low score (meaning, the box is not very confident about detecting a class)
Select only one box when several boxes overlap with each other and detect the same object.

2.2 Filtering with a threshold on class scores

此步目的是把score小于某个阈值的box给去掉。
变量：

box_confidence: tensor of shape $(19 \times 19, 5, 1)$ containing $p_c$ (confidence probability that there’s some object) for each of the 5 boxes predicted in each of the 19x19 cells.
boxes: tensor of shape $(19 \times 19, 5, 4)$ containing $(b_x, b_y, b_h, b_w)$ for each of the 5 boxes per cell.
box_class_probs: tensor of shape $(19 \times 19, 5, 80)$ containing the detection probabilities $(c_1, c_2, … c_{80})$ for each of the 80 classes for each of the 5 boxes per cell.

步骤：

Compute box scores
Find the box_classes thanks to the max box_scores, keep track of the corresponding score
Create a filtering mask based on “box_class_scores” by using “threshold”. The mask should have the same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
Apply the mask to scores, boxes and classes

在写此步代码的过程中，有几点需要注意的：

使用keras.backend.max时，不要令keepdims=True，关于keepdims的用法，我会另起一段解释说明。
使用tf.boolean_mask时，忽略doc上给出的’axis’参数。实际上，本步骤也并不需要用到该参数。

2.3 Non-max suppression

在过滤掉一部分classes和scores以后，仍然会有许多重叠的box，对剩下的box再进行选择的步骤成为non-maximum suppression (NMS)
Definition of "Intersection over Union
其中，NMS 用到一个函数Intersection over Union(交并比/IoU)。

交并比的计算过程这里就不赘述了，使用tf提供的函数可以直接计算NMS，而无需用到交并比。其中，交集计算的主要思路是计算出两个box所交区域的top-left以及bottom-right两个点再进一步计算出相交区域面积。

两个有用函数的文档：

实现（调包，十分简单，主要传入的max_boxes是一个tensor，而不是整数）：

 max_boxes_tensor = K.variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()
K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor

# Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
#注意把max_boxes转换成tensor
### START CODE HERE ### (≈ 1 line)
nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes_tensor, iou_threshold)
### END CODE HERE ###

# Use K.gather() to select only nms_indices from scores, boxes and classes
### START CODE HERE ### (≈ 3 lines)
scores = K.gather(scores, nms_indices)
boxes = K.gather(boxes, nms_indices)
classes = K.gather(classes, nms_indices)
### END CODE HERE ###

Autonomous driving - Car detection

自动驾驶 - 车辆检测

1 问题描述

2 YOLO

2.1 Model details

2.2 Filtering with a threshold on class scores

2.3 Non-max suppression

FEATURED TAGS

FRIENDS