2015年6月14日 星期日

Rich feature hierarchies for accurate object detection and semantic segmantation


Ross Girshick Jeff Donahue Trevor Darrell Jitendra Malik


Introduction

Object detection, just like image classification, plays an important role in MIR. The main difference is that object detection tells you not only what are in the image, but also where they are. This paper proposed a method that combine region proposals with CNNS to do object detection, called R-CNN, beaten the state of the art method by miles. 

Method

The following figure is the overview of this system.



The first step is finding the candidate regions, called region proposals, in the image that probably contain the object. There are various papers offer methods for  generating region proposals, and this work used selective search.
Then we need to do feature extraction. This paper used CNN, which architecture proposed by Krizhevsky et al., to extract a 4096-dimension feature vector from each region proposal. Note that because the input of CNN must be the same size, it's necessary to do resizing on each region proposal. The author chose the simplest one, simple warp, to do this.
After the features of region proposals are extracted, the author used SVM to do classification. SVM will tell you what class of this proposal is.
The CNN on the second step was supervised pre-trained on a large auxiliary dataset (ILSVRC 2012).
Because CNN has multiple layers, the author then discussed which layer of CNN should be selected as the input of SVM. After several experiments, they found that with find-tuning, the fully connected layer 7 has the highest performance.

Performance



We can find that the accuracy of this work is far higher than other works.

Result

















沒有留言:

張貼留言