AMMAI 2015 Paper reading: 4月 2015

2015年4月30日星期四

Iterative Quantization: A Procrustean Approach to Learning Binary Codes

Yunchao Gong and Svetlana Lazebnik

Introduction

As the amount of image data is growing fast, encoding high dimensional image descriptor as compact binary string gains many benefits, like computation speed or storage. We generally use PCA directly to reduce the dimension of data. However, the variance of the data in each PCA direction is different, higher-variance directions carry much more information, encoding each direction with the same number of bits is bound to produce poor performance.

Dimension Reduction

We want to make the variance of each bit maximized and the bits are pairwise uncorrelated. We can do this by maximizing the objective function:

Binary Quantization

Using binary code to represent data means we have to quantize the data into binary code. Of course, the smaller quantization error is better. The author found that we can randomly rotate the projected data

and

So we have to minimize the quantization loss function

The minimization procedure is iteratively do the following two steps:

(1)Fix R and update B

(2)Fix B and update R

Step (1) means we have to maximize

To do step (2), we first compute the SVD of B^T*V as A*B*C^T, and let R=C*A^T.

Evaluation

The accuracy line of this work is the most top one, means it's the best.

More result

2015年4月29日星期三

Efficient visual search of videos cast as text retrieval

Josef Sivic & Andrew Zisserman

Introduction

In this work, the author wanted to purpose a fast, efficient way to retrieve some particular object from the video frames. Many works have succeeded in identifying an object in an image. The author involved some text retrieval technology in his work, and hoped that we can query an object from the video, just like using google.

Feature Description

It is necessary that the descriptor is unaffected by changing in viewpoint, scale, illumination. Many works have provided the solutions for this. The author chose two different way to find the feature region:
(1) Shape Adapted (SA)
(2) Maximally Stable (MS),
then used SIFT as the descriptor.

Visual Vocabulary

The author used K-Means clustering to do quantization, and used Mahalanobis distance as the distance function.

Mahalanobis distance

Frequency Computation

Term frequency-inverse document frequency (tf-idf) was used as the way to compute the weight of a vocabulary.

Experiments

The way to use this system is: first, user can pick some region from some frame, generally this region contains some object. Then the system would return the frames that also contain the object of the one picked by user.

We can see that the accuracy is not bad.

2015年4月30日 星期四

Iterative Quantization: A Procrustean Approach to Learning Binary Codes

Yunchao Gong and Svetlana Lazebnik

Introduction

Dimension Reduction

Binary Quantization

Evaluation

2015年4月29日 星期三

Efficient visual search of videos cast as text retrieval

Josef Sivic & Andrew Zisserman

Introduction

Feature Description

Visual Vocabulary

Frequency Computation

Experiments

2015年4月30日星期四

2015年4月29日星期三