Autonomous driving - Car detection
本次编程作业主要基于吴恩达的深度学习课程(4.3 目标检测)以及以下两篇论文:Redmon et al., 2016 以及 Redmon and Farhadi, 2016。
1 问题描述
- Build a car detection system
- There are images gathered that have been labelled by drawing bounding boxes around every car you found.
- If you have 80 classes that you want YOLO to recognize, you can represent the class label $c$ either as an integer from 1 to 80, or as an 80-dimensional vector (with 80 numbers) one component of which is 1 and the rest of which are 0.
2 YOLO
YOLO (“you only look once”)是一个具有高准确度并可实时运行的算法。
2.1 Model details
- The input is a batch of images of shape (m, 608, 608, 3)
- The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers $(p_c, b_x, b_y, b_h, b_w, c)$ as explained above. If you expand $c$ into an 80-dimensional vector, each bounding box is then represented by 85 numbers.
We will use 5 anchor boxes. So you can think of the YOLO architecture as the following: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).
若物体中的的中心落入到网格当中,这个网格则负责把该物体检测出来。
For each box (of each cell) we will compute the following elementwise product and extract a probability that the box contains a certain class.
Here’s one way to visualize what YOLO is predicting on an image:
- For each of the 19x19 grid cells, find the maximum of the probability scores (taking a max across both the 5 anchor boxes and across different classes).
- Color that grid cell according to what object that grid cell considers the most likely.
Doing this results in this picture:
Note that this visualization isn’t a core part of the YOLO algorithm itself for making predictions; it’s just a nice way of visualizing an intermediate result of the algorithm.
Another way to visualize YOLO’s output is to plot the bounding boxes that it outputs. Doing that results in a visualization like this:
In the figure above, we plotted only boxes that the model had assigned a high probability to, but this is still too many boxes. You’d like to filter the algorithm’s output down to a much smaller number of detected objects. To do so, you’ll use non-max suppression. Specifically, you’ll carry out these steps:
- Get rid of boxes with a low score (meaning, the box is not very confident about detecting a class)
- Select only one box when several boxes overlap with each other and detect the same object.
2.2 Filtering with a threshold on class scores
此步目的是把score小于某个阈值的box给去掉。
变量:
box_confidence
: tensor of shape $(19 \times 19, 5, 1)$ containing $p_c$ (confidence probability that there’s some object) for each of the 5 boxes predicted in each of the 19x19 cells.boxes
: tensor of shape $(19 \times 19, 5, 4)$ containing $(b_x, b_y, b_h, b_w)$ for each of the 5 boxes per cell.box_class_probs
: tensor of shape $(19 \times 19, 5, 80)$ containing the detection probabilities $(c_1, c_2, … c_{80})$ for each of the 80 classes for each of the 5 boxes per cell.
步骤:
- Compute box scores
- Find the box_classes thanks to the max box_scores, keep track of the corresponding score
- Create a filtering mask based on “box_class_scores” by using “threshold”. The mask should have the same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
- Apply the mask to scores, boxes and classes
在写此步代码的过程中,有几点需要注意的:
- 使用
keras.backend.max
时,不要令keepdims=True
,关于keepdims的用法,我会另起一段解释说明。 - 使用
tf.boolean_mask
时,忽略doc上给出的’axis’参数。实际上,本步骤也并不需要用到该参数。
2.3 Non-max suppression
在过滤掉一部分classes和scores以后,仍然会有许多重叠的box,对剩下的box再进行选择的步骤成为non-maximum suppression (NMS)
其中,NMS 用到一个函数Intersection over Union(交并比/IoU)
。
交并比的计算过程这里就不赘述了,使用tf提供的函数可以直接计算NMS,而无需用到交并比。其中,交集计算的主要思路是计算出两个box所交区域的top-left以及bottom-right两个点再进一步计算出相交区域面积。
两个有用函数的文档:
实现(调包,十分简单,主要传入的max_boxes是一个tensor,而不是整数):
1 | max_boxes_tensor = K.variable(max_boxes, dtype='int32') # tensor to be used in tf.image.non_max_suppression() |