AMMAI 2015 Paper reading: 5月 2015

2015年5月31日星期日

Probabilistic Latent Semantic Indexing

Thomas Hofmann

Introduction

Latent Semantic Analysis (LSA) is a technique that is used to find the latent topic for some documents. Because the thing we can observe is just ''There are some documents, and they include some words', what topic is this document is unknown. So we need LSA to help finding these latent topic. However, LSA lacks some significant statistical features, this author proposed a new method called Probabilistic Latent Semantic Analysis (PLSA), that improves these shortages.

Method

The author introduced a statistical model called 'Aspect model', which is the core of PLSA. For some class (or topic, unobservable)z 􏰉 Z 􏰞 fz􏰈 􏰝 􏰑 􏰑 􏰑 􏰝 zK g

, word

, and document

, we can form the following formulation

If we can find this decomposition for P(d,w), we can find the relationship between latent topic and word and document. The method is using Expectation Maximization (EM) algorithm.

First, we need to define the similarity (or likelihood) between the objective function and the function what we found.

Second, re-parameterizing the objective function

Finally, iteratively applying EM algorithm

E-step

M-step

The author then proposed a improved EM algorithm called tempered EM (TEM), which can help to avoid overfitting.

Other

This is the geometry of the aspect model

PLSA v.s. LSA

LSA -> SVD(

), L2 distance

PLSA -> U=P(d|z), Sigma=diag(P(Z)), V=P(w|z), likelihood function

Finally, the author proposed 2 improved PLSA.

(i) PLSA-U : more smoothen

P'(w|d)=n(d,w)/n(d),

and Final P''(w|d)=lambda*P'(w|d)+(1-lambda)*P(w|d)

(ii) PLSA-Q : low-dimensional representation

Result

These is the 10 most probable words for the query 'flight' and 'love'.

performance

We can see the performance is quite good. The PR curves are on the top.

word

2015年5月5日星期二

Online Dictionary Learning for Sparse Coding

Julien Mairal & Francis Bach & Jean Ponce & Guillermo Sapiro

Introduction

Sparse coding means modeling data vector as sparse linear combinations of basis elements. In this paper, the author proposed a method that can do online dictionary learning, means if there is a new data, we don't have to retrain the model from the beginning.

Classical Dictionary Learning Techniques

The model introduced above is called 'Dictionary', represented by D. We want to find the difference between the original data and the data recovered from D, then formulate a cost function.

The difference (or distance) function is

We use this difference function to formulate the empirical cost function

Because we don't want the length of alpha term and D term become too long, we added the regularization term

and constrain

However, the difference function above is not convex, we adjusted it to

which is convex.

Then the author said that we are usually not interested in a perfect minimization of the empirical cost function, but in the minimization of the expected cost function

means that we don't train the dictionary with all data points. We just sample some data point x to do the training.

The process of training dictionary - proposed by other author - is as following:

Online Dictionary Learning

The algorithm is summarized as following:

To do step 7, we need algorithm 2

Result

2015年5月2日星期六

To aggregate or not to aggregate: Selective match kernels for image search

Giorgos Tolias, Yannis Avrithis, Herve Jegou

Introduction

This paper is interested in improving visual recognition of objects, locations and scenes. To do this, we require some methods to compute the similarity of two different objects. 'Kernel' is the important thing during similarity computation. The author first introduced some basic kernels, then proposed their advanced methods, which added some new features called selectivity and aggregation.