The Kalman Filter

These notes are based on

Meinhold R.J., Singpurwalla N.D., (1983): Understanding the Kalman Filter. The American Statistician, 37, 123–127.

Model

The Kalman Filter provides an efficient recursive estimator for the unobserved state of a linear discrete time dynamical system in the presence of measurement error. Kalman (1960) first introduced the method in the Engineering literature, but it can be understood in the context of Bayesian inference.

Let $y_{t}$ denote a vector of observed variables at time $t$ and let $s_{t}$ denote the unobserved state variables of the system at time $t$ . We wish to conduct inference about the state variables given only the observed data ${y_{t}}$ and the structure of a linear model consisting of a measurement equation and a transition equation.

The evolution of the observed variable depends on the state variables through a linear measurement equation

(1)

y_{t} = F s_{t} + ε_{t}, ε_{t} \sim N (0, Ω_{ε}) .

The variable $y_{t}$ is observed with measurement error which follows the Normal distribution with mean zero and covariance matrix $Ω_{ε}$ .

The state vector $s_{t}$ obeys the transition equation

(2)

s_{t} = G s_{t - 1} + η_{t}, η_{t} \sim N (0, Ω_{η})

where $G$ and $Ω_{η}$ are known matrices and $η_{t}$ captures the influence of effects that are outside the model on the state transition process. The noise terms $ε_{t}$ and $η_{t}$ are independent. In general $G$ and $F$ can be time-dependent but for the sake of simplicity the time subscripts are omitted here.

The Kalman Filter is similar in nature to the standard linear regression model. The state of the process $s_{t}$ corresponds to the regression coefficients, however the state is not constant over time, requiring the introduction of the transition equation.

Bayesian Interpretation

Let $Y_{t} = (y_{t}, y_{t - 1}, \dots, y_{1})$ denote the complete history of observed data at time $t$ . Our goal is to obtain the posterior distribution of $s_{t}$ given $Y_{t}$ . We know from Bayes’ Theorem that

(3)

\begin{aligned} \Pr (s_{t} | Y_{t}) & = \frac{\Pr (y_{t} | s_{t}, Y_{t - 1}) \Pr (s_{t} | Y_{t - 1})}{\Pr (y_{t} | Y_{t - 1})} \\ \propto \Pr (y_{t} | s_{t}, Y_{t - 1}) \Pr (s_{t} | Y_{t - 1}) . \end{aligned}

The left-hand side is the posterior distribution of $s_{t}$ . On the second line, the first term is the likelihood of $s_{t}$ and the second term is the prior distribution of $s_{t}$ . This equation defines a recursive Bayesian updating relationship.

At time $t - 1$ , our knowledge of the system is summarized by the the posterior distribution

(4)

s_{t - 1} | Y_{t - 1} \sim N ({\hat{s}}_{t - 1}, Σ_{t - 1})

where ${\hat{s}}_{t - 1}$ is our previous estimate about the mean of $s_{t - 1}$ . This process is initialized at time $0$ by specifying ${\hat{s}}_{0}$ and $Σ_{0}$ .

Before observing $y_{t}$ , our best prediction of $s_{t}$ comes from (2), namely $G s_{t - 1} + η_{t}$ . However, combining this with (4), we have

(5)

s_{t} | Y_{t - 1} \sim N (G {\hat{s}}_{t - 1}, R_{t}),

where

(6)

R_{t} \equiv G Σ_{t - 1} G^{⊤} + Ω_{η} .

This follows directly from the properties of the multivariate Normal distribution.

After observing $y_{t}$ , we can update our knowledge about $s_{t}$ using the likelihood $\Pr (y_{t} | s_{t}, Y_{t - 1})$ . Let $e_{t}$ denote the error in predicting $y_{t}$ ,

(7)

e_{t} \equiv y_{t} - {\hat{y}}_{t} = y_{t} - F G {\hat{s}}_{t - 1} .

Observing $e_{t}$ is equivalent to observing $y_{t}$ since $F$ , $G$ , and $s_{t - 1}$ are all known. Thus, (3) becomes

(8)

\Pr (s_{t} | y_{t}, Y_{t - 1}) = \Pr (s_{t} | e_{t}, Y_{t - 1}) \propto \Pr (e_{t} | s_{t}, Y_{t - 1}) \Pr (s_{t} | Y_{t - 1}) .

Now, using the measurement equation (1), we can write $e_{t} = F (s_{t} - G {\hat{s}}_{t - 1}) + ε_{t}$ , and therefore

(9)

e_{t} | s_{t}, Y_{t - 1} \sim N (F (s_{t} - G {\hat{s}}_{t - 1}), Ω_{ε}) .

Now, from Bayes’ Theorem the posterior distribution of $s_{t}$ satisfies

(10)

\Pr (s_{t} | y_{t}, Y_{t - 1}) = \frac{\Pr (e_{t} | s_{t}, Y_{t - 1}) \Pr (s_{t} | Y_{t - 1})}{\int \Pr (e_{t}, s_{t} | Y_{t - 1}) {ds}_{t}} .

Once this probability is computed, we can perform another iteration of the recursion by going back to (3).

Calculating the Posterior Distribution

We can calculate the posterior distribution (10) directly by appealing to the properties of the Normal Distribution. Note that

(11)

(\begin{matrix} s_{t} \\ e_{t} \end{matrix}) | Y_{t - 1} \sim N [(\begin{matrix} G s_{t - 1} \\ 0 \end{matrix}), (\begin{matrix} R & R_{t} F^{⊤} \\ F R_{t} & Ω_{ε} + F R_{t} F^{⊤} \end{matrix})] .

where $R_{t}$ is given by (6). Conditional on $e_{t}$ , the distribution of $s_{t}$ is

(12)

s_{t} | e_{t}, Y_{t - 1} \sim N [G {\hat{s}}_{t - 1} R_{t} F^{⊤} (Ω_{ε} + F R_{t} F^{⊤})^{- 1} e_{t}, R_{t} - R_{t} F^{⊤} (Ω_{ε} + F R_{t} F^{⊤})^{- 1} F R_{t}] .

To summarize, the posterior distribution of $s_{t}$ was be calculated recursively by first choosing initial values for $s_{0}$ and $Σ_{0}$ . Then at each period $t$ , given the posterior distribution of $s_{t - 1}$ , with mean ${\hat{s}}_{t - 1}$ and covariance matrix $Σ_{t - 1}$ as in (4), we form a prior for $s_{t}$ with mean $G {\hat{s}}_{t - 1}$ and variance $R_{t} = G Σ_{t - 1} G^{⊤} + Ω_{η}$ as in (5), evaluate the likelihood in (9) given $e_{t} = y_{t} - F G {\hat{s}}_{t - 1}$ , and then arrive at the posterior in time $t$ given by (12).

Algorithm

Using the theoretical derivation as a guide, we can implement the Kalman Filter as a recursive algorithm. Given initialization values $s_{0}$ and $Σ_{0}$ , at time $t$ ,

The posterior distribution at time $t - 1$ is Normal with mean ${\hat{s}}_{t - 1}$ and covariance matrix $Σ_{t - 1}$ .
Form the covariance matrix of the prior distribution, $R_{t} = G Σ_{t - 1} G^{⊤} + Ω_{η};$
Calculate the mean of the posterior, ${\hat{s}}_{t} = G {\hat{s}}_{t - 1} + R_{t} F^{⊤} (Ω_{ε} + F R_{t} F^{⊤})^{- 1} e_{t};$
Calculate the variance of the posterior: $Σ_{t} = R_{t} - R_{t} F^{⊤} (Ω_{ε} + F R_{t} F^{⊤})^{- 1} F R_{t} .$