Crisan (2001)
These notes are based on the article:
Crisan, Dan (2001): “Particle Filters–A Theoretical Perspective,” in Sequential Monte Carlo Methods in Practice, ed. by A. Doucet, N. de Freitas, and N. Gordon. New York: Springer-Verlag.
Introduction
This article provides a formal mathematical treatment of the convergence of particle filters. It presents several theorems which provide necessary and sufficient conditions for each of two types of convergence of the particle filter to the posterior distribution of the signal.
Notation
This chapter essentially deals with random measures on . Let denote the -algebra of Borel subsets of , let denote the set of bounded measurable functions on , and let denote the set of bounded continuous functions on . We employ the sup norm on such that for any .
Let denote the set of probability measures over . For any such measure and any measurable function , we denote the integral of with respect to as
We endow with the weak topology. Then for any sequence in , we say that converges weakly to if In this case we write or simply .
Markov chains
Let be a probability space and let let be a stochastic process on this space. Let denote the -algebra generated by the process. We say that is a Markov chain if for every and we have Let denote the transition kernel of , where gives the probability of arriving in given that .
The Filtering Problem
Let denote the signal, a Markov process on with kernel , and let denote the observation process, a stochastic process on which evolves according to with . Here, is a measurable function that is continuous in on for all and are independent random vectors. Let denote the density of with respect to Lebesgue measure and suppose that is bounded and continuous.
The filtering problem is to compute the conditional density of the signal given a sequence of observations on the observation process up to time . We denote this distribution of interest as for all . We will also use the predicted conditional density given ,
Convergence of random measures
The particle filter essentially constructs a sequence of random measures, or measure-valued random variables, which approximate the true conditional distribution . This sequence is designed to converge to the true distribution, but we must be more specific about what we mean by convergence here. Crisan outlines two modes of convergence in the chapter, but I will only cover the second mode here: almost sure convergence in the weak topology.
Definition. Let be a sequence of random probability measures, and let be a deterministic probability measure. We say that the sequence converges weakly almost surely to if there exists some set with measure zero such that for each , we have We write this as -a.s..
Particle Filters
The class of particle filters described below involves a collection of particles which evolve according to a specified Markov transition kernel each period and are resampled with replacement according to their importance weights. At each point in time , the resulting empirical measure is shown to converge to the true conditional distribution as the number of particles approaches infinity.
Crisan discusses a much more general class of algorithms than the one I describe below, but generally speaking most particle filters use an algorithm similar to the following.
Initialization. The particle filter is initialized with an empirical measure which is built from a random sample of size from : where and denotes the Dirac delta mass located at .
Note that is a random measure because the particles were drawn randomly from . The resulting measure is conditional upon the particular sequence of random deviates used to construct these draws. It is useful to think of in this case as a vector of Uniform(0,1) deviates that could be used to generate draws from the distribution of . In subsequent steps, it will also be necessary to generate random draws to transition the particles.
Iteration. This step constructs given . We are given We pass each through the transition kernel to generate particles such that These particles move independently of each other. We can now build an approximation to the predicted conditional distribution :
Next, for each particle we compute the importance weight Now, we resample particles with replacement in proportion to the importance weights to obtain a sample in order to construct the desired approximation
Convergence Theorems
Crisan shows that if the random measures are constructed according to the algorithm described above, then for all , we have -a.s.. Again, Crisan’s results are much more general, but this simple example illustrates the general idea of the results.
Notes
The results presented in this chapter concern only convergence. Rates of convergence are not discussed, but are discussed in Crisan, Del Moral, and Lyons (1999).