2015年5月31日 星期日

Probabilistic Latent Semantic Indexing

Thomas Hofmann


Introduction

Latent Semantic Analysis (LSA) is a technique that is used to find the latent topic for some documents. Because the thing we can observe is just ''There are some documents, and they include some words', what topic is this document is unknown. So we need LSA to help finding these latent topic. However, LSA lacks some significant statistical features, this author proposed a new method called Probabilistic Latent Semantic Analysis (PLSA), that improves these shortages. 

Method

The author introduced a statistical model called 'Aspect model', which is the core of PLSA. For some class (or topic, unobservable)z 􏰉 Z 􏰞 fz􏰈 􏰝 􏰑 􏰑 􏰑 􏰝 zK g


, word 
, and document

 , we can form the following formulation

If we can find this decomposition for P(d,w), we can find the relationship between latent topic and word and document. The method is using Expectation Maximization (EM) algorithm. 
First, we need to define the similarity (or likelihood) between the objective function and the function what we found. 
Second, re-parameterizing the objective function
Finally, iteratively applying EM algorithm

E-step

M-step
The author then proposed a improved EM algorithm called tempered EM (TEM), which can help to avoid overfitting. 

Other

This is the geometry of the aspect model

PLSA v.s. LSA
LSA -> SVD(), L2 distance

PLSA -> U=P(d|z), Sigma=diag(P(Z)), V=P(w|z), likelihood function

Finally, the author proposed 2 improved PLSA.

(i) PLSA-U : more smoothen
P'(w|d)=n(d,w)/n(d), 
and Final P''(w|d)=lambda*P'(w|d)+(1-lambda)*P(w|d)

(ii) PLSA-Q : low-dimensional representation

Result


These is the 10 most probable words for the query 'flight' and 'love'.

performance


We can see the performance is quite good. The PR curves are on the top.












word

2015年5月5日 星期二

Online Dictionary Learning for Sparse Coding

Julien Mairal & Francis Bach & Jean Ponce & Guillermo Sapiro


Introduction

Sparse coding means modeling data vector as sparse linear combinations of basis elements. In this paper, the author proposed a method that can do online dictionary learning, means if there is a new data, we don't have to retrain the model from the beginning.

Classical Dictionary Learning Techniques

The model introduced above is called 'Dictionary', represented by D. We want to find the difference between the original data and the data recovered from D, then formulate a cost function.
The difference (or distance) function is
We use this difference function to formulate the empirical cost function
Because we don't want the length of alpha term and D term become too long, we added the regularization term and constrain

However, the difference function above is not convex, we adjusted it to 
which is convex.
Then the author said that we are usually not interested in a perfect minimization of the empirical cost function, but in the minimization of the expected cost function
means that we don't train the dictionary with all data points. We just sample some data point x to do the training. 
The process of training dictionary - proposed by other author - is as following:

Online Dictionary Learning

The algorithm is summarized as following:

To do step 7, we need algorithm 2

Result






2015年5月2日 星期六

To aggregate or not to aggregate: Selective match kernels for image search

Giorgos Tolias, Yannis Avrithis, Herve Jegou


Introduction

This paper is interested in improving visual recognition of objects, locations and scenes. To do this, we require some methods to compute the similarity of two different objects. 'Kernel' is the important thing during similarity computation. The author first introduced some basic kernels, then proposed their advanced methods, which added some new features called selectivity and aggregation.

Basic match kernels

(1) Bag-of-words (BOW)

(2) Hamming Embedding (HE)
h: Hamming distance

(3) Vector or Locally Aggregated Descriptor (VLAD)

Common kernel model

Non-aggregated and aggregated kernel have their individual form as following:

Non-aggregated kernels
:an arbitrary vector representation function
:a scalar selectivity function

Aggregated kernels
:another vector representation function
:aggregated vector representation of a set Xc of descriptors in a cell

Advanced match kernel

Following the forms above, the author chose their different representation function and proposed their own kernels

Selective match kernel (SMK)


Aggregated selective match kernel (ASMK)

Binarization

To save memory, the author used binary vector b instead of residual r(x).

SMK*

ASMK*

Experiments

Plot the mAP to different alpha


Plot the mAP to different tau


Plot the mAP to different k


The result that using aggregation to save memory usage


Compare with other methods


The result that using different feature