Introduction to Sequential Monte Carlo Methods
These notes are based on the following article:
Doucet, Arnaud, Nando de Freitas, and Neil Gordon (2001): “An Introduction to Sequential Monte Carlo Methods,” in Sequential Monte Carlo Methods in Practice, ed. by A. Doucet, N. de Freitas, and N. Gordon. New York: Springer-Verlag.
Introduction
Real-world data analysis often requires estimating unknown quantities given only a sequence of observations on some related observable quantity. In a Bayesian framework, one typically has a priori knowledge of the model: a prior distribution of the unobservable quantity of interest and likelihood functions which relate the observables to the unobservables. The resulting posterior distribution of the unobservables can be calculated using Bayes’ theorem. This allows one to conduct inference about the unobserved quantities.
In some cases, it is natural to process observations sequentially. These cases are the focus of this introductory article and the rest of the book. For example, on-line applications such as radar tracking or estimating financial instruments where new data become available in real time, it is easier to update the previously formed posterior distribution than to recalculate it from scratch.
Sequential Monte Carlo methods are simulation-based methods for calculating approximations to posterior distributions. They avoid making linearity or normality assumptions required by related methods such as the Kalman filter.
Model
Let denote the sequence of unobserved states, with . The authors consider only the case when is a Markov process with initial distribution . The Markov assumption is not needed and is only for tractability. Let denote the transition equation. Let denote a sequence of observations with . Suppose that, given , the observations are conditionally independent. Let denote the marginal distribution, or the observation equation.
Thus, the model is defined by the following distributions:
Define and . Our goal is to estimate the posterior distribution recursively. We also may care about the marginal distribution or an expectation such as for some that is integrable with respect to .
Bayes’ theorem gives us the posterior distribution at any time :
There is an inherent recursive relationship for updating the posterior distribution. We can express in terms of : where the last equality uses the definition of the conditional density, the conditional independence assumption, and the Markov assumption. The expression in the denominator is constant with respect to .
Perfect Monte Carlo Sampling
In the ideal situation, we could draw a sample of size from , the posterior distribution of interest. Call this sample . We can form an estimate of using the empirical distribution of the sample: From here, we can estimate the integral by
Typically it is impossible to get such a sample since is multivariate, known only up to a constant of proportionality, and non-standard. Markov chain Monte Carlo methods may be used in similar situations, but they are not well-suited to recursive problems such as the ones considered here.
Importance Sampling
Since we can’t draw from directly, we can use importance sampling to draw from indirectly using an importance sampling density and a corresponding importance weight for each draw. The importance sampling density must be chosen so that its support includes the support of . We have where is the importance weight.
Thus, to approximate the expectation , we simply need to draw a sample from and calculate where are the normalized importance weights.
This procedure can be interpreted as a sampling method with an approximate posterior given by We can then approximate by
Although this solves the problem of being able to obtain draws, this approach is still not well-suited for our recursive problem.
Sequential Importance Sampling (SIS)
The goal of Sequential Importance Sampling (SIS) is to compute an estimate of using the past simulated values . We have the following recursive representation of the importance sampling distribution: This recursive relationship allows us to calculate the importance weights recursively: Hence, Notice that the term is constant for all and cancels out.
A special case occurs when so that and
SIS is usually inefficient for high-dimensional integrals because as , the importance weights for some particles quickly approaches zero.
The Bootstrap Filter
To prevent particle degeneracy, the bootstrap filter introduces a resampling step which eliminates particles with low importance weights. Instead of using the importance-weighted empirical distribution we use a uniformly-weighted distribution where is the number of offspring of the particle which is to be determined by some branching mechanism. The most common such mechanism involves resampling times from (Gordon et al., 1993). We require for all . If , the particle dies. We want to choose such that The surviving particles, those with , are approximately distributed according to .
Algorithm
Initialization ().
- For , draw .
- Set .
Importance Sampling Step
For , draw and set .
For , calculate the importance weights and normalize them.
Resampling Step
Take draws with replacement from the set with weights .
Set and go to step 2.