Events@THP

CMT Group Seminar | February 11, 10:00

Large scale probabilistic modeling and machine learning

Stephan Mandt

Modern data analysis requires computations on massive data. For example, consider the problem of
automatically classifying articles in a huge digital archive containing millions of entries, or recommending
items to millions of users based on their purchase history. Bayesian probabilistic modeling allows us to
make assumptions about hidden structure in the data that is not directly observable. In Bayesian inference,
we fit a probability distribution that reveals this structure.

Bayesian inference has been inspired by theoretical physics over many years, a prominent example
being Markov Chain Monte Carlo algorithms.
In a more recent approach called variational inference, Bayesian inference is mapped to an optimization problem. Here, we fit a parametrized
'mean field' distribution by optimizing over variational parameters in a way to maximize the statistical evidence of the data.
This method scales up to massive data sets when using stochastic optimization, termed stochastic
variational inference (SVI).

SVI uses easy-to-compute noisy gradients by subsampling from the large underlying data set.
It suffers from two major problems: noisy gradients and non-convexity. We introduce a scheme
that reduces the noise by averaging over parts of the past gradients.
This introduces a bias, and we discuss the tradeoff between variance and bias. To enable convergence to better local optima,
we introduce deterministic annealing for SVI. We introduce a temperature parameter that deterministically
deforms the objective, and then reduce this parameter over the course of the optimization. We test both
methods on Latent Dirichlet Allocation, a topic model, applied to three large text corpora.

Columbia University
Seminar room 0.01, ETP
Contact: Achim Rosch