Crisan (2001)

Particle Filters—A Theoretical Perspective.

These notes are based on the article:

Crisan, Dan (2001): “Particle Filters–A Theoretical Perspective,” in Sequential Monte Carlo Methods in Practice, ed. by A. Doucet, N. de Freitas, and N. Gordon. New York: Springer-Verlag.

Introduction

This article provides a formal mathematical treatment of the convergence of particle filters. It presents several theorems which provide necessary and sufficient conditions for each of two types of convergence of the particle filter to the posterior distribution of the signal.

Notation

This chapter essentially deals with random measures on $ℝ^{d}$ . Let $ℬ (ℝ^{d})$ denote the $σ$ -algebra of Borel subsets of $ℝ^{d}$ , let $B (ℝ^{d})$ denote the set of bounded measurable functions on $ℝ^{d}$ , and let $𝒞_{b} (ℝ^{d})$ denote the set of bounded continuous functions on $ℝ^{d}$ . We employ the sup norm on $𝒞_{b} (ℝ^{d})$ such that

(1)

‖ f ‖ \equiv \sup_{x \in ℝ^{d}} | f (x) |

for any $f \in 𝒞_{b} (ℝ^{d})$ .

Let $𝒫 (ℝ^{d})$ denote the set of probability measures over $ℬ (ℝ^{d})$ . For any such measure $μ \in 𝒫 (ℝ^{d})$ and any measurable function $f$ , we denote the integral of $f$ with respect to $μ$ as

(2)

μ f \equiv \int f d μ .

We endow $𝒫 (ℝ^{d})$ with the weak topology. Then for any sequence ${μ_{n}}$ in $𝒫 (ℝ^{d})$ , we say that $μ_{n}$ converges weakly to $μ \in 𝒫 (ℝ^{d})$ if

(3)

\lim_{n \to ∞} μ_{n} f = μ f, for all f \in 𝒞_{b} (ℝ^{d}) .

In this case we write $\lim_{n \to ∞} μ_{n} = μ$ or simply $μ_{n} \to μ$ .

Markov chains

Let $(Ω, ℱ, P)$ be a probability space and let let $X = {X_{t}, t \in ℕ}$ be a stochastic process on this space. Let $ℱ_{t}^{X}$ denote the $σ$ -algebra generated by the process. We say that $X$ is a Markov chain if for every $t$ and $A \in ℬ (ℝ^{n_{x}})$ we have

(4)

P (X_{t + 1} \in A | ℱ_{t}^{X}) = P (X_{t + 1} \in A | X_{t}) .

Let $K_{t} : ℝ^{n_{x}} \times ℬ (ℝ^{n_{x}}) \to [0, 1]$ denote the transition kernel of $X$ , where

(5)

K_{t} (x, A) = P (X_{t + 1} \in A | X_{t} = x)

gives the probability of $X_{t + 1}$ arriving in $A$ given that $X_{t} = x$ .

The Filtering Problem

Let $X = {X_{t}, t \in ℕ}$ denote the signal, a Markov process on $ℝ^{n_{x}}$ with kernel $K_{t} (x, dy)$ , and let $Y = {Y_{t}, t \in ℕ}$ denote the observation process, a stochastic process on $ℝ^{n_{y}}$ which evolves according to

(6)

Y_{t} \equiv h (t, X_{t}) + W_{t}, t > 0,

with $Y_{0} = 0$ . Here, $h : ℕ \times ℝ^{n_{x}} \to ℝ^{n_{y}}$ is a measurable function that is continuous in on $ℝ^{n_{x}}$ for all $t \in ℕ$ and $W_{t} : Ω \to ℝ^{n_{y}}$ are independent random vectors. Let $g (t, \cdot)$ denote the density of $W_{t}$ with respect to Lebesgue measure and suppose that $g$ is bounded and continuous.

The filtering problem is to compute the conditional density of the signal $X_{t}$ given a sequence of observations $y_{0 : t} = {y_{0}, \dots, y_{t}}$ on the observation process $Y$ up to time $t$ . We denote this distribution of interest as

(7)

π_{t}^{y_{0 : t}} (A) \equiv P (X_{t} \in A | Y_{0 : t} = y_{0 : t})

for all $A \in ℬ (ℝ^{n_{x}})$ . We will also use the predicted conditional density given $y_{0 : t - 1}$ ,

(8)

p_{t}^{y_{0 : t - 1}} (A) \equiv P (X_{t} \in A | Y_{0 : t - 1} = y_{0 : t - 1}) .

Convergence of random measures

The particle filter essentially constructs a sequence of random measures, or measure-valued random variables, which approximate the true conditional distribution $π_{t}^{y_{0 : t}}$ . This sequence is designed to converge to the true distribution, but we must be more specific about what we mean by convergence here. Crisan outlines two modes of convergence in the chapter, but I will only cover the second mode here: almost sure convergence in the weak topology.

Definition. Let ${μ^{n}}$ be a sequence of random probability measures, $μ^{n} : Ω \to 𝒫 (ℝ^{d})$ and let $μ \in 𝒫 (ℝ^{d})$ be a deterministic probability measure. We say that the sequence ${μ^{n}}$ converges weakly almost surely to $μ$ if there exists some set $E$ with measure zero such that for each $ω \in \tilde{E}$ , we have

(9)

\lim_{n \to ∞} μ^{n} (ω) f = μ f, for all f \in 𝒞_{b} (ℝ^{d}) .

We write this as $μ^{n} \to μ$ $P$ -a.s..

Particle Filters

The class of particle filters described below involves a collection of $n$ particles which evolve according to a specified Markov transition kernel each period and are resampled with replacement according to their importance weights. At each point in time $t$ , the resulting empirical measure $π_{t}^{n}$ is shown to converge to the true conditional distribution $π_{t}^{y_{0 : t}}$ as the number of particles approaches infinity.

Crisan discusses a much more general class of algorithms than the one I describe below, but generally speaking most particle filters use an algorithm similar to the following.

Initialization. The particle filter is initialized with an empirical measure $π_{0}^{n}$ which is built from a random sample of size $n$ from $π_{0}$ :

(10)

π_{0}^{n} = \frac{1}{n} \sum_{i = 1}^{n} δ_{x_{0}^{(i)}}

where $x_{0}^{(i)} \sim π_{0}$ and $δ_{x}$ denotes the Dirac delta mass located at $x$ .

Note that $π_{0}^{n}$ is a random measure because the particles were drawn randomly from $π_{0}$ . The resulting measure is conditional upon the particular sequence of random deviates used to construct these draws. It is useful to think of $ω$ in this case as a vector of Uniform(0,1) deviates that could be used to generate draws from the distribution of $π_{0}$ . In subsequent steps, it will also be necessary to generate random draws to transition the particles.

Iteration. This step constructs $π_{t}^{n}$ given $π_{t - 1}^{n}$ . We are given

(11)

π_{t - 1}^{n} = \frac{1}{n} \sum_{i = 1}^{n} δ_{x_{t - 1}^{(i)}} .

We pass each $x_{t - 1}^{(i)}$ through the transition kernel to generate particles ${\bar{x}}_{t}^{(i)}$ such that

(12)

{\bar{x}}_{t}^{(i)} \sim K_{t - 1} (x_{t - 1}^{(i)}, \cdot) .

These particles move independently of each other. We can now build an approximation to the predicted conditional distribution $p_{t}$ :

(13)

p_{t}^{n} = \frac{1}{n} \sum_{i = 1}^{n} δ_{{\bar{x}}_{t}^{(i)}} .

Next, for each particle ${\bar{x}}_{t}^{(i)}$ we compute the importance weight

(14)

w_{t}^{(i)} = \frac{g (x_{t}^{(i)})}{\sum_{j = 1}^{n} g (x_{t}^{(j)})} .

Now, we resample $n$ particles with replacement in proportion to the importance weights to obtain a sample ${x_{t}^{(1)}, \dots, x_{t}^{(n)}}$ in order to construct the desired approximation

(15)

π_{t}^{n} = \frac{1}{n} \sum_{i = 1}^{n} δ_{x_{t}^{(i)}} .

Convergence Theorems

Crisan shows that if the random measures $π_{t}^{n}$ are constructed according to the algorithm described above, then for all $t$ , we have $π_{t}^{n} \to π_{t}^{y_{0 : t}}$ $P$ -a.s.. Again, Crisan’s results are much more general, but this simple example illustrates the general idea of the results.

Notes

The results presented in this chapter concern only convergence. Rates of convergence are not discussed, but are discussed in Crisan, Del Moral, and Lyons (1999).