# Auxiliary Particle Filter

These notes are based on the following articles:

Pitt, Michael K., and Neil Shephard (1999): “Filtering via Simulation: Auxiliary Particle Filters,”

*Journal of the American Statistical Association*, 94, 590–599.Pitt, Michael K., and Neil Shephard (2001): “Auxiliary Variable Based Particle Filters,” in

*Sequential Monte Carlo Methods in Practice*, ed. by A. Doucet, N. de Freitas, and N. Gordon. New York: Springer-Verlag.

## Introduction

Consider a time series ${y}_{t}$ for $t=1,\dots ,n$ that is independent conditional
on an unobserved state ${\alpha}_{t}$ which is assumed to be Markov process. We
wish to perform *on-line* filtering to learn about the unobserved state given
the currently available information by estimating the density
$f({\alpha}_{t}|{y}_{1},\dots ,{y}_{t})=f({\alpha}_{t}|{Y}_{t})$
for $t=1,\dots ,n.$ The *measurement density* $f({y}_{t}|{\alpha}_{t})$
and *transition density* $f({\alpha}_{t+1}|{\alpha}_{t})$ implicitly depend on
a finite vector of parameters. The initial distribution of the state is
$f({\alpha}_{0})$.

Suppose we know the filtering distribution $f({\alpha}_{t}|{Y}_{t})$ at time $t$ and we receive a new observation for period $t+1$. We can obtain the updated filtering density in two steps. First, we use the transition density to obtain $f({\alpha}_{t+1}|{Y}_{t})$ from $f({\alpha}_{t}|{Y}_{t})$ as $$f({\alpha}_{t+1}|{Y}_{t})=\int f({\alpha}_{t+1}|{\alpha}_{t})\mathrm{dF}({\alpha}_{t}|{Y}_{t}).$$ Then, we obtain the new filtering density $f({\alpha}_{t+1}|{Y}_{t+1})$ by using Bayes’ Theorem: $$f({\alpha}_{t+1}|{Y}_{t+1})=\frac{f({y}_{t+1}|{\alpha}_{t+1})f({\alpha}_{t+1}|{Y}_{t})}{\int f({y}_{t+1}|{\alpha}_{t+1})\mathrm{dF}({\alpha}_{t+1}|{Y}_{t})}.$$

Hence, filtering essentially involves applying the recursive relationship

If the support of ${\alpha}_{t+1}|{\alpha}_{t}$ is known and finite, then the above integral is simply the weighted sum over the points in the support. In other cases, numerical methods might need to be used.

## Particle Filters

Particle filters are a class of simulation-based filters that recursively approximate the distribution of ${\alpha}_{t}|{Y}_{t}$ using a collection of particles ${\alpha}_{t}^{1},\dots ,{\alpha}_{t}^{M}$ with probability masses ${\pi}_{t}^{1},\dots ,{\pi}_{t}^{M}$. The particles are thought of as a sample from $f({\alpha}_{t}|{Y}_{t})$. In this article, the weights are taken to be equal: ${\pi}_{t}^{1}=\cdots ={\pi}_{t}^{M}=1/M$ for all $t$. As $M\to \mathrm{\infty}$, we want the approximation to become better. Thus, we can approximate the true filtering density (1) by an empirical one:

Then, a new sample of particles ${\alpha}_{t+1}^{1},\dots ,{\alpha}_{t+1}^{M}$ can be
generated from this empirical density and the procedure can continue
recursively. A particle filter is said to be *fully adapted* if it generates
independent and identically distributed samples from (2).
It is useful to think of (2) as a posterior density which
is the product of a prior, ${\sum}_{j=1}^{M}f({\alpha}_{t+1}|{\alpha}_{t}^{j})$, and
a likelihood $f({y}_{t+1}|{\alpha}_{t+1})$.

Assuming that we can evaluate $f({y}_{t+1}|{\alpha}_{t+1})$ up to a constant of proportionality, we can sample from (2) by first obtaining a draw ${\alpha}_{t}^{j}$ with probability $1/M$ and then drawing from $f({\alpha}_{t+1}|{\alpha}_{t}^{j})$. The authors describe three of the possible methods for doing this. The most commonly used is the Sampling/importance resampling (SIR) method of Rubin (1987). The first particle filter, independently proposed by several authors, was based on SIR. In particular, Gordon, Salmond, and Smith (1993) suggested it for non-Gaussian, nonlinear state space models and Kitagawa (1996) for time series models. The other two methods, acceptance sampling and MCMC methods, are discussed in the article but not in these notes.

### Sampling/importance resampling (SIR)

Given a set of draws ${\alpha}_{t}^{1},\dots ,{\alpha}_{t}^{M}$, the SIR method first takes draws ${\alpha}_{t+1}^{1},\dots ,{\alpha}_{t+1}^{R}$ from $f({\alpha}_{t+1}|{\alpha}_{t}^{j})$ and assigns a weight ${\pi}_{t+1}^{j}$ to each draw, where

and ${w}_{j}=f({y}_{t+1}|{\alpha}_{t+1}^{j})$.
This weighted sample converges to a *nonrandom* sample from
the empirical filtering distribution as $R\to \mathrm{\infty}$. To generate a *random*
sample of size $M$, a resampling step is introduced where the draws
${\alpha}_{t+1}^{1},\dots ,{\alpha}_{t+1}^{R}$ are resampled with weights
${\pi}_{t+1}^{1},\dots ,{\pi}_{t+1}^{R}$ to produce a uniformly weighted sample.

### Adaption

Basically, the SIR particle filter above produces proposal draws of
${\alpha}_{t+1}$ without taking into account the new information, the value of
${y}_{t+1}$. A particle filter is said to be *adapted* if it makes proposal
draws taking into account this new information. An adapted version of the
algorithm would look something like

Draw ${\alpha}_{t+1}^{r}\sim g({\alpha}_{t+1}|{y}_{t+1})$ for $r=1,\dots ,R$.

Evaluate the weights

- Resample with weights proportional to ${w}_{t+1}^{r}$ to obtain a sample of size $M$.

This algorithm allows for proposals to come from a general density $g({\alpha}_{t+1}|{y}_{t+1})$ which depends on ${y}_{t+1}$ as opposed to the standard SIR particle filter where the proposal density does not depend on ${y}_{t+1}$. To understand how the importance weights above were derived, consider the importance sampler of $f({\alpha}_{t+1}|{y}_{t+1})$ with the importance sampling density $g({\alpha}_{t+1}|{y}_{t+1})$. We would first take draws from $g({\alpha}_{t+1}|{y}_{t+1})$ and then weight by $f({\alpha}_{t+1}|{y}_{t+1})/g({\alpha}_{t+1}|{y}_{t+1})$. But from Bayes’ Theorem,

Hence, after dividing by $g({\alpha}_{t+1}|{y}_{t+1})$, we have the importance weights shown above.

This illustrates the difficulty of adapting the standard particle filter. To obtain a single new particle we must evaluate $M+1$ densities: $f({y}_{t+1}|{\alpha}_{t+1})$ as well as $f({\alpha}_{t+1}|{\alpha}_{t}^{j})$ for each $j=1,\dots ,M$.

## Auxiliary Particle Filters

The authors extend standard particle filtering methods by including an auxiliary variable which allows the particle filter to be adapted in a more efficient way. They introduce a variable, $k$, which is an index to the mixture (2) and filter in a higher dimension. This auxiliary variable is introduced only to aid in simulation. With this additional variable, the filtering density we wish to approximate becomes

for $k=1,\dots ,M$. Now, if we can sample from $f({\alpha}_{t+1},k|{Y}_{t+1})$, then we can discard the sampled values of $k$ and be left with a sample from the original filtering density (2).

To sample from (6) using SIR, we make $R$ proposal draws $({\alpha}_{t+1}^{j},{k}^{j})$ from some proposal density $g({\alpha}_{t+1},k|{Y}_{t+1})$ and calculate the weights

for $j=1,\dots ,R$.

The choice of $g$ is left completely to the researcher. The authors propose a generic choice of $g$ which can be applied in many situations and go on to provide more examples in specific models where the structure of the model informs the choice of $g$. Here, I present only the generic $g$ in terms of the SIR algorithm. The density (6) can be approximated by

where ${\mu}_{t+1}^{k}$ is some value with a high probability of occurance, for
example, the mean or mode of the distribution of ${\alpha}_{t+1}|{\alpha}_{t}^{k}$. This choice is made for convenience since
$$g(k|{Y}_{t+1})\propto \int f({y}_{t+1}|{\mu}_{t+1}^{k})\mathrm{dF}({\alpha}_{t+1}|{\alpha}_{t}^{k})=f({y}_{t+1}|{\mu}_{t+1}^{k}).$$
Hence, we can draw from $g({\alpha}_{t+1},k|{Y}_{t+1})$ by first drawing
values of $k$ with probabilities ${\lambda}_{k}\propto g(k|{Y}_{t+1})$ and
then drawing from the transition probabilities $f({\alpha}_{t+1}|{\alpha}_{t}^{k})$. The weights ${\lambda}_{k}$ are called *first stage weights*. Then,
after sampling $R$ times from $g({\alpha}_{t+1},k|{Y}_{t+1})$ we form the
weights

for $r=1,\dots ,R$. We could also resample $M$ times from this distribution.

## Auxiliary Particle Filter Algorithm

The following algorithm is based on the generic choice of $g$ from the discussion above. Other choices are possible, and may be more efficient for some model specifications.

Initialize the algorithm with a uniformly weighted sample ${\alpha}_{0}^{1},\dots ,{\alpha}_{0}^{M}$ from the distribution $f({\alpha}_{0})$.

Given draws ${\alpha}_{t}^{1},\dots ,{\alpha}_{t}^{M}$ from $f({\alpha}_{t}|{Y}_{t})$, determine ${\mu}_{t+1}^{k}$ and the

*first stage weights*${\lambda}_{k}\propto f({y}_{t+1}|{\mu}_{t+1}^{k})$ for each $k=1,\dots ,M$.For $r=1,\dots ,R$, draw ${k}^{r}$ from the indices $k=1,\dots ,M$ with weights ${\lambda}_{k}$ and then draw ${\alpha}_{t+1}^{r}$ from the transition density $f({\alpha}_{t+1}|{\alpha}_{t}^{{k}^{r}}).$

Form the weights ${w}_{r}$ according to (9).

Resample $M$ times from these $R$ draws with weights ${w}_{r}$ if desired.