Auxiliary Particle Filter

These notes are based on the following articles:

Pitt, Michael K., and Neil Shephard (1999): “Filtering via Simulation: Auxiliary Particle Filters,” Journal of the American Statistical Association, 94, 590–599.
Pitt, Michael K., and Neil Shephard (2001): “Auxiliary Variable Based Particle Filters,” in Sequential Monte Carlo Methods in Practice, ed. by A. Doucet, N. de Freitas, and N. Gordon. New York: Springer-Verlag.

Introduction

Consider a time series $y_{t}$ for $t = 1, \dots, n$ that is independent conditional on an unobserved state $α_{t}$ which is assumed to be Markov process. We wish to perform on-line filtering to learn about the unobserved state given the currently available information by estimating the density $f (α_{t} | y_{1}, \dots, y_{t}) = f (α_{t} | Y_{t})$ for $t = 1, \dots, n .$ The measurement density $f (y_{t} | α_{t})$ and transition density $f (α_{t + 1} | α_{t})$ implicitly depend on a finite vector of parameters. The initial distribution of the state is $f (α_{0})$ .

Suppose we know the filtering distribution $f (α_{t} | Y_{t})$ at time $t$ and we receive a new observation for period $t + 1$ . We can obtain the updated filtering density in two steps. First, we use the transition density to obtain $f (α_{t + 1} | Y_{t})$ from $f (α_{t} | Y_{t})$ as $f (α_{t + 1} | Y_{t}) = \int f (α_{t + 1} | α_{t}) dF (α_{t} | Y_{t}) .$ Then, we obtain the new filtering density $f (α_{t + 1} | Y_{t + 1})$ by using Bayes’ Theorem: $f (α_{t + 1} | Y_{t + 1}) = \frac{f (y_{t + 1} | α_{t + 1}) f (α_{t + 1} | Y_{t})}{\int f (y_{t + 1} | α_{t + 1}) dF (α_{t + 1} | Y_{t})} .$

Hence, filtering essentially involves applying the recursive relationship $label filtering f (α_{t + 1} | Y_{t + 1}) \propto f (y_{t + 1} | α_{t + 1}) \int f (α_{t + 1} | α_{t}) dF (α_{t} | Y_{t}) .$ If the support of $α_{t + 1} | α_{t}$ is known and finite, then the above integral is simply the weighted sum over the points in the support. In other cases, numerical methods might need to be used.

Particle Filters

Particle filters are a class of simulation-based filters that recursively approximate the distribution of $α_{t} | Y_{t}$ using a collection of particles $α_{t}^{1}, \dots, α_{t}^{M}$ with probability masses $π_{t}^{1}, \dots, π_{t}^{M}$ . The particles are thought of as a sample from $f (α_{t} | Y_{t})$ . In this article, the weights are taken to be equal: $π_{t}^{1} = \dots = π_{t}^{M} = 1 / M$ for all $t$ . As $M \to ∞$ , we want the approximation to become better. Thus, we can approximate the true filtering density (eq:filtering) by an empirical one: $label {empirical}_{filtering} \hat{f} (α_{t + 1} | Y_{t + 1}) \propto f (y_{t + 1} | α_{t + 1}) \sum_{j = 1}^{M} f (α_{t + 1} | α_{t}^{j}) .$ Then, a new sample of particles $α_{t + 1}^{1}, \dots, α_{t + 1}^{M}$ can be generated from this empirical density and the procedure can continue recursively. A particle filter is said to be fully adapted if it generates independent and identically distributed samples from (eq:empirical_filtering). It is useful to think of (eq:empirical_filtering) as a posterior density which is the product of a prior, $\sum_{j = 1}^{M} f (α_{t + 1} | α_{t}^{j})$ , and a likelihood $f (y_{t + 1} | α_{t + 1})$ .

Assuming that we can evaluate $f (y_{t + 1} | α_{t + 1})$ up to a constant of proportionality, we can sample from (eq:empirical_filtering) by first obtaining a draw $α_{t}^{j}$ with probability $1 / M$ and then drawing from $f (α_{t + 1} | α_{t}^{j})$ . The authors describe three of the possible methods for doing this. The most commonly used is the Sampling/importance resampling (SIR) method of Rubin (1987). The first particle filter, independently proposed by several authors, was based on SIR. In particular, Gordon, Salmond, and Smith (1993) suggested it for non-Gaussian, nonlinear state space models and Kitagawa (1996) for time series models. The other two methods, acceptance sampling and MCMC methods, are discussed in the article but not in these notes.

Sampling/importance resampling (SIR)

Given a set of draws $α_{t}^{1}, \dots, α_{t}^{M}$ , the SIR method first takes draws $α_{t + 1}^{1}, \dots, α_{t + 1}^{R}$ from $f (α_{t + 1} | α_{t}^{j})$ and assigns a weight $π_{t + 1}^{j}$ to each draw, where $π_{t + 1}^{j} = \frac{w_{j}}{\sum_{i = 1}^{R} w_{i}}$ and $w_{j} = f (y_{t + 1} | α_{t + 1}^{j})$ . This weighted sample converges to a nonrandom sample from the empirical filtering distribution as $R \to ∞$ . To generate a random sample of size $M$ , a resampling step is introduced where the draws $α_{t + 1}^{1}, \dots, α_{t + 1}^{R}$ are resampled with weights $π_{t + 1}^{1}, \dots, π_{t + 1}^{R}$ to produce a uniformly weighted sample.

Adaption

Basically, the SIR particle filter above produces proposal draws of $α_{t + 1}$ without taking into account the new information, the value of $y_{t + 1}$ . A particle filter is said to be adapted if it makes proposal draws taking into account this new information. An adapted version of the algorithm would look something like

Draw $α_{t + 1}^{r} \sim g (α_{t + 1} | y_{t + 1})$ for $r = 1, \dots, R$ .
Evaluate the weights $w_{t + 1}^{r} = \frac{f (y_{t + 1} | α_{t + 1}^{r}) \sum_{j = 1}^{M} f (α_{t + 1}^{r} | α_{t}^{j})}{g (α_{t + 1}^{r} | y_{t + 1})} .$
Resample with weights proportional to $w_{t + 1}^{r}$ to obtain a sample of size $M$ .

This algorithm allows for proposals to come from a general density $g (α_{t + 1} | y_{t + 1})$ which depends on $y_{t + 1}$ as opposed to the standard SIR particle filter where the proposal density does not depend on $y_{t + 1}$ . To understand how the importance weights above were derived, consider the importance sampler of $f (α_{t + 1} | y_{t + 1})$ with the importance sampling density $g (α_{t + 1} | y_{t + 1})$ . We would first take draws from $g (α_{t + 1} | y_{t + 1})$ and then weight by $f (α_{t + 1} | y_{t + 1}) / g (α_{t + 1} | y_{t + 1})$ . But from Bayes’ Theorem, $f (α_{t + 1} | Y_{t + 1}) \propto f (y_{t + 1} | α_{t + 1}) \int f (α_{t + 1} | α_{t}) dF (α_{t} | Y_{t}) \approx f (y_{t + 1} | α_{t + 1}) \sum_{j = 1}^{M} f (α_{t + 1} | α_{t}^{j}) .$ Hence, after dividing by $g (α_{t + 1} | y_{t + 1})$ , we have the importance weights shown above.

This illustrates the difficulty of adapting the standard particle filter. To obtain a single new particle we must evaluate $M + 1$ densities: $f (y_{t + 1} | α_{t + 1})$ as well as $f (α_{t + 1} | α_{t}^{j})$ for each $j = 1, \dots, M$ .

Auxiliary Particle Filters

The authors extend standard particle filtering methods by including an auxiliary variable which allows the particle filter to be adapted in a more efficient way. They introduce a variable, $k$ , which is an index to the mixture (eq:empirical_filtering) and filter in a higher dimension. This auxiliary variable is introduced only to aid in simulation. With this additional variable, the filtering density we wish to approximate becomes $label {filter}_{density}_{aux} f (α_{t + 1}, k | Y_{t + 1}) \propto f (y_{t + 1} | α_{t + 1}) f (α_{t + 1} | α_{t}^{k})$ for $k = 1, \dots, M$ . Now, if we can sample from $f (α_{t + 1}, k | Y_{t + 1})$ , then we can discard the sampled values of $k$ and be left with a sample from the original filtering density (eq:empirical_filtering).

To sample from (eq:filter_density_aux) using SIR, we make $R$ proposal draws $(α_{t + 1}^{j}, k^{j})$ from some proposal density $g (α_{t + 1}, k | Y_{t + 1})$ and calculate the weights $label {aux}_{weights} w_{j} = \frac{f (y_{t + 1} | α_{t + 1}^{j}) f (α_{t + 1}^{j} | α_{t}^{k^{j}})}{g (α_{t + 1}^{j}, k^{j} | Y_{t + 1})}$ for $j = 1, \dots, R$ .

The choice of $g$ is left completely to the researcher. The authors propose a generic choice of $g$ which can be applied in many situations and go on to provide more examples in specific models where the structure of the model informs the choice of $g$ . Here, I present only the generic $g$ in terms of the SIR algorithm. The density (eq:filter_density_aux) can be approximated by $g (α_{t + 1}, k | Y_{t + 1}) \propto f (y_{t + 1} | μ_{t + 1}^{k}) f (α_{t + 1} | α_{t}^{k})$ where $μ_{t + 1}^{k}$ is some value with a high probability of occurance, for example, the mean or mode of the distribution of $α_{t + 1} | α_{t}^{k}$ . This choice is made for convenience since $g (k | Y_{t + 1}) \propto \int f (y_{t + 1} | μ_{t + 1}^{k}) dF (α_{t + 1} | α_{t}^{k}) = f (y_{t + 1} | μ_{t + 1}^{k}) .$ Hence, we can draw from $g (α_{t + 1}, k | Y_{t + 1})$ by first drawing values of $k$ with probabilities $λ_{k} \propto g (k | Y_{t + 1})$ and then drawing from the transition probabilities $f (α_{t + 1} | α_{t}^{k})$ . The weights $λ_{k}$ are called first stage weights. Then, after sampling $R$ times from $g (α_{t + 1}, k | Y_{t + 1})$ we form the weights $label {aux}_{weights} w_{r} = \frac{f (y_{t + 1} | α_{t + 1}^{r})}{f (y_{t + 1} | μ_{t + 1}^{k^{r}})}$ for $r = 1, \dots, R$ . We could also resample $M$ times from this distribution.

Auxiliary Particle Filter Algorithm

The following algorithm is based on the generic choice of $g$ from the discussion above. Other choices are possible, and may be more efficient for some model specifications.

Initialize the algorithm with a uniformly weighted sample $α_{0}^{1}, \dots, α_{0}^{M}$ from the distribution $f (α_{0})$ .
Given draws $α_{t}^{1}, \dots, α_{t}^{M}$ from $f (α_{t} | Y_{t})$ , determine $μ_{t + 1}^{k}$ and the first stage weights $λ_{k} \propto f (y_{t + 1} | μ_{t + 1}^{k})$ for each $k = 1, \dots, M$ .
For $r = 1, \dots, R$ , draw $k^{r}$ from the indices $k = 1, \dots, M$ with weights $λ_{k}$ and then draw $α_{t + 1}^{r}$ from the transition density $f (α_{t + 1} | α_{t}^{k^{r}}) .$
Form the weights $w_{r}$ according to (eq:aux_weights).
Resample $M$ times from these $R$ draws with weights $w_{r}$ if desired.