The Metropolis-Hastings Algorithm
The Metropolis-Hastings algorithm, developed by Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller (1953) and generalized by Hastings (1970), is a Markov chain Monte Carlo method which allows for sampling from a distribution when traditional sampling methods such as transformation or inversion fail. It requires only being able to evaluate the density function. The normalizing constant need not be known. This algorithm is very general and gives rise to the Gibbs sampler as a special case.
Markov Chain Monte Carlo Simulation
Consider a Markov transition kernel for and , where is the Borel -algebra on . Two major concerns in Markov chain theory are whether there exists an invariant distribution andf whether iterating the transition kernel converges to the invariant distribution. The invariant distribution satisfies where is the density of with respect to Lebesgue measure. Let denote the -th iteration of the transition kernel. We also want to converge to as . That is, the distribution of the draws generated by the iterations is approximately .
Markov chain Monte Carlo methods look at the problem from the opposite perspective. The invariant distribution is known: it is the target distribution. The problem is how to generate an appropriate transition kernel with the aforementioned convergence property. Suppose that the transition kernel can be expressed as where , if and otherwise, and define . The term represents the probability of staying at which may be nonzero. If this kernel satisfies the reversibility condition then is the invariant distribution of . The Metropolis-Hastings algorithm generates a with this property.
The Metropolis-Hastings Algorithm
Let denote the candidate-generating density, where . This is essentially the conditional density of given . This density could potentially be used as the term in the transition kernel, but it may not satisfy (eq:reversibility). If, for example, we have then we can adjust by using a probability of move . Transitions will be made using for .
The choice of follows the following logic. If
(eq:inequality) holds, then moves from to are happening too
often under . We should thus choose . But then,
in order to satisfy (eq:reversibility), we must have
This implies that
Conversely, we can consider the case when the inequality in
(eq:reversibility) were reversed to derive .
Thus, to summarize, in order to satisfy reversibility, we set
Hence, the desired transition kernel is
Thus, the Metropolis-Hastings algorithm is defined by the candidate-generating density . Note that does not require knowledge of the normalizing constant because it drops out of the ratio . A special case arises when the candidate-generating density is symmetric: since the probability of move reduces to . This special case forms the basis for optimization algorithms such as Simulated annealing.
The algorithm proceeds as follows: Given an initial value , for each
- Draw from and from
- If , set .
- Otherwise, set .
References
Chib, S. and E. Greenberg (1995): “Understanding the Metropolis Hastings Algorithm,” American Statistical Journal, 49, 327–335.
Hastings, W.K. (1970): “Monte Carlo Sampling Methods Using Markov Chains and Their Applications,” Biometrika, 57, 97–109.
Metropolis, N., A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller (1953): “Equations of State Calculations by Fast Computing Machines,” Journal of Chemical Physics, 21, 1087–1092.
Robert, C.P., and G. Casella (2004): Monte Carlo Statistical Methods, Second Edition. New York: Springer.