# Crisan (2001)

These notes are based on the article:

Crisan, Dan (2001): “Particle Filters–A Theoretical Perspective,”
in *Sequential Monte Carlo Methods in Practice*,
ed. by A. Doucet, N. de Freitas, and N. Gordon. New York:
Springer-Verlag.

## Introduction

This article provides a formal mathematical treatment of the convergence of particle filters. It presents several theorems which provide necessary and sufficient conditions for each of two types of convergence of the particle filter to the posterior distribution of the signal.

## Notation

This chapter essentially deals with random measures on ${\mathbb{R}}^{d}$. Let $\mathcal{B}({\mathbb{R}}^{d})$ denote the $\sigma $-algebra of Borel subsets of ${\mathbb{R}}^{d}$, let $B({\mathbb{R}}^{d})$ denote the set of bounded measurable functions on ${\mathbb{R}}^{d}$, and let ${\mathcal{C}}_{b}({\mathbb{R}}^{d})$ denote the set of bounded continuous functions on ${\mathbb{R}}^{d}$. We employ the sup norm on ${\mathcal{C}}_{b}({\mathbb{R}}^{d})$ such that

for any $f\in {\mathcal{C}}_{b}({\mathbb{R}}^{d})$.

Let $\mathcal{P}({\mathbb{R}}^{d})$ denote the set of probability measures over $\mathcal{B}({\mathbb{R}}^{d})$. For any such measure $\mu \in \mathcal{P}({\mathbb{R}}^{d})$ and any measurable function $f$, we denote the integral of $f$ with respect to $\mu $ as

We endow $\mathcal{P}({\mathbb{R}}^{d})$ with the weak topology. Then for any sequence $\{{\mu}_{n}\}$ in $\mathcal{P}({\mathbb{R}}^{d})$, we say that ${\mu}_{n}$ converges weakly to $\mu \in \mathcal{P}({\mathbb{R}}^{d})$ if

In this case we write ${\mathrm{lim}}_{n\to \mathrm{\infty}}{\mu}_{n}=\mu $ or simply ${\mu}_{n}\to \mu $.

## Markov chains

Let $(\Omega ,\mathcal{F},P)$ be a probability space and let
let $X=\{{X}_{t},t\in \mathbb{N}\}$ be a
stochastic process on this space. Let ${\mathcal{F}}_{t}^{X}$ denote
the $\sigma $-algebra generated by the process. We say that $X$
is a **Markov chain** if for every $t$ and $A\in \mathcal{B}({\mathbb{R}}^{{n}_{x}})$
we have

Let ${K}_{t}:{\mathbb{R}}^{{n}_{x}}\times \mathcal{B}({\mathbb{R}}^{{n}_{x}})\to [0,1]$ denote the transition kernel of $X$, where

gives the probability of ${X}_{t+1}$ arriving in $A$ given that ${X}_{t}=x$.

## The Filtering Problem

Let $X=\{{X}_{t},t\in \mathbb{N}\}$ denote the *signal*, a Markov process
on ${\mathbb{R}}^{{n}_{x}}$ with kernel
${K}_{t}(x,\mathrm{dy})$, and let $Y=\{{Y}_{t},t\in \mathbb{N}\}$ denote
the *observation* process, a stochastic process on ${\mathbb{R}}^{{n}_{y}}$ which
evolves according to

with ${Y}_{0}=0$. Here, $h:\mathbb{N}\times {\mathbb{R}}^{{n}_{x}}\to {\mathbb{R}}^{{n}_{y}}$ is a measurable function that is continuous in on ${\mathbb{R}}^{{n}_{x}}$ for all $t\in \mathbb{N}$ and ${W}_{t}:\Omega \to {\mathbb{R}}^{{n}_{y}}$ are independent random vectors. Let $g(t,\cdot )$ denote the density of ${W}_{t}$ with respect to Lebesgue measure and suppose that $g$ is bounded and continuous.

The filtering problem is to compute the conditional density of the signal ${X}_{t}$ given a sequence of observations ${y}_{0:t}=\{{y}_{0},\dots ,{y}_{t}\}$ on the observation process $Y$ up to time $t$. We denote this distribution of interest as

for all $A\in \mathcal{B}({\mathbb{R}}^{{n}_{x}})$. We will also use the predicted conditional density given ${y}_{0:t-1}$,

## Convergence of random measures

The particle filter essentially constructs a sequence of random measures, or measure-valued random variables, which approximate the true conditional distribution ${\pi}_{t}^{{y}_{0:t}}$. This sequence is designed to converge to the true distribution, but we must be more specific about what we mean by convergence here. Crisan outlines two modes of convergence in the chapter, but I will only cover the second mode here: almost sure convergence in the weak topology.

**Definition.** Let $\{{\mu}^{n}\}$ be a sequence of random probability measures,
${\mu}^{n}:\Omega \to \mathcal{P}({\mathbb{R}}^{d})$ and let $\mu \in \mathcal{P}({\mathbb{R}}^{d})$
be a deterministic probability measure. We say that the sequence $\{{\mu}^{n}\}$
converges weakly almost surely to $\mu $ if there exists some set $E$ with measure zero
such that for each $\omega \in \tilde{E}$, we have

We write this as ${\mu}^{n}\to \mu $ $P$-a.s..

## Particle Filters

The class of particle filters described below involves a collection of $n$ particles which evolve according to a specified Markov transition kernel each period and are resampled with replacement according to their importance weights. At each point in time $t$, the resulting empirical measure ${\pi}_{t}^{n}$ is shown to converge to the true conditional distribution ${\pi}_{t}^{{y}_{0:t}}$ as the number of particles approaches infinity.

Crisan discusses a much more general class of algorithms than the one I describe below, but generally speaking most particle filters use an algorithm similar to the following.

**Initialization.** The particle filter is initialized with an empirical measure
${\pi}_{0}^{n}$ which is built from a random sample of size $n$ from ${\pi}_{0}$:

where ${x}_{0}^{(i)}\sim {\pi}_{0}$ and ${\delta}_{x}$ denotes the Dirac delta mass located at $x$.

Note that ${\pi}_{0}^{n}$ is a random measure because the particles were drawn randomly from ${\pi}_{0}$. The resulting measure is conditional upon the particular sequence of random deviates used to construct these draws. It is useful to think of $\omega $ in this case as a vector of Uniform(0,1) deviates that could be used to generate draws from the distribution of ${\pi}_{0}$. In subsequent steps, it will also be necessary to generate random draws to transition the particles.

**Iteration.** This step constructs ${\pi}_{t}^{n}$ given ${\pi}_{t-1}^{n}$. We are given

We pass each ${x}_{t-1}^{(i)}$ through the transition kernel to generate particles ${\overline{x}}_{t}^{(i)}$ such that

These particles move independently of each other. We can now build an approximation to the predicted conditional distribution ${p}_{t}$:

Next, for each particle ${\overline{x}}_{t}^{(i)}$ we compute the importance weight

Now, we resample $n$ particles with replacement in proportion to the importance weights to obtain a sample $\{{x}_{t}^{(1)},\dots ,{x}_{t}^{(n)}\}$ in order to construct the desired approximation

## Convergence Theorems

Crisan shows that if the random measures ${\pi}_{t}^{n}$ are constructed according to the algorithm described above, then for all $t$, we have ${\pi}_{t}^{n}\to {\pi}_{t}^{{y}_{0:t}}$ $P$-a.s.. Again, Crisan’s results are much more general, but this simple example illustrates the general idea of the results.

### Notes

The results presented in this chapter concern only convergence. Rates of convergence are not discussed, but are discussed in Crisan, Del Moral, and Lyons (1999).