# Maximum Simulated Likelihood

Given a sample of observations $\left\{{y}_{i}:i=1,\dots ,N\right\}$, the log-likelihood function for an unknown parameter $\theta$ is

(1)${l}_{N}\left(\theta \right)\equiv \sum _{i=1}^{N}\mathrm{ln}f\left(\theta |{y}_{i}\right).$

Let $\stackrel{˜}{f}\left(\theta |{y}_{i},\omega \right)$ be an unbiased simulator such that

(2)${E}_{\omega }\left[\stackrel{˜}{f}\left(\theta |y,\omega \right)|y\right]=f\left(\theta ;y\right)$

where $\omega$ is a vector of $R$ simulated random variates. Then, the maximum simulated likelihood (MSL) estimator for $\theta$ is

(3)${\stackrel{˜}{\theta }}_{\text{MSL}}\equiv \mathrm{arg}\underset{\theta }{\mathrm{max}}{\stackrel{˜}{l}}_{N}\left(\theta \right)$

where ${\stackrel{˜}{l}}_{N}\left(\theta \right)\equiv {\sum }_{i=1}^{N}\mathrm{ln}\stackrel{˜}{f}\left(\theta |{y}_{i},\omega \right)$ for some sequence of simulations $\left\{{\omega }_{i}\right\}$.

There are two points which deserve special attention. First, the estimator is conditional upon the particular sequence of simulations $\left\{{\omega }_{i}\right\}$ used. That is to say one will obtain a different estimate for each such sequence used. Second, even though the simulator of $f$ is unbiased, the resulting MSL estimate will be biased. That is, even though we have

(4)$E\left[l\left(\theta \right)\right]=l\left(\theta \right),$

this does not imply

(5)$E\left[\mathrm{arg}\underset{\theta }{\mathrm{max}}\stackrel{˜}{l}\left(\theta \right)\right]=\mathrm{arg}\underset{\theta }{\mathrm{max}}l\left(\theta \right).$

Unbiased simulation of the log-likelihood function is generally infeasible due to the nonlinearity introduced by the natural log transformation of the likelihood function, which can usually be simulated without bias.

## Consistency

All is not lost because, even though our estimate is biased, we can still obtain an estimator whose probability limit is the same as the MLE. This requires that the sample average of the simulated log-likelihood converges to the sample average log-likelihood. This can be accomplished by increasing the number of simulations, and thus decreasing the simulation error, at a sufficiently fast rate relative to the sample size. We have the following lemma (see Newey and McFadden, 1994):

Lemma. Suppose the following:

1. $\theta \in \Theta \subset {ℝ}^{K}$ and $\Theta$ is compact,
2. ${Q}_{0}\left(\theta \right)$ and ${Q}_{N}\left(\theta \right)$ are continuous in $\theta$,
3. ${\theta }_{0}\equiv \mathrm{arg}{\mathrm{max}}_{\theta \in \Theta }{Q}_{0}\left(\theta \right)$ is unique,
4. ${\stackrel{^}{\theta }}_{N}\equiv \mathrm{arg}{\mathrm{max}}_{\theta \in \Theta }{Q}_{N}\left(\theta \right)$, and
5. ${Q}_{N}\left(\theta \right)\to {Q}_{0}\left(\theta \right)$ in probability uniformly in $\theta$ as $N\to \infty$.

Then, ${\stackrel{^}{\theta }}_{N}\to {\theta }_{0}$ in probability.

Now, suppose that $f$ satisfies the conditions of this lemma. In particular, suppose that the obersvations ${y}_{i}$ are iid, that $\theta$ is identified, and that $f\left(\theta ,y\right)$ is continuous in $\theta$ over some compact set $\Theta$. Finally, assume that $E\left[{\mathrm{sup}}_{\theta \in \Theta }|\mathrm{ln}f\left(\theta ,y\right)|\right]$ is finite.

Now, given a sequence of simulators ${\omega }_{\mathrm{ir}}$, iid across $r$, the the MSL estimator defined as

(6)${\stackrel{˜}{\theta }}_{\text{MSL}}\equiv \mathrm{arg}\underset{\theta }{\mathrm{max}}\frac{1}{N}\sum _{i=1}^{N}\mathrm{ln}\stackrel{˜}{f}\left(\theta |{y}_{i},{\omega }_{i}\right)$

is consistent if $R\to \infty$ as $N\to \infty$. For a proof refer to Hajivassiliou and Ruud (1994, p. 2417).

## Asymptotic Normality

Suppose that $\stackrel{˜}{f}$ is differentiable in $\theta$. Then we can form a Taylor expansion approximation of ${\Delta }_{\theta }\stackrel{˜}{l}\left(\theta \right)$ around ${\theta }_{0}$:

(7)${\Delta }_{\theta }\stackrel{˜}{l}\left({\stackrel{^}{\theta }}_{\text{MSL}}\right)={\Delta }_{\theta }\stackrel{˜}{l}\left({\theta }_{0}\right)+{\Delta }_{\theta }^{2}\stackrel{˜}{l}\left(\overline{\theta }\right)\left({\stackrel{^}{\theta }}_{\text{MSL}}-{\theta }_{0}\right)$

for some $\overline{\theta }$ lying on the line segment between ${\stackrel{^}{\theta }}_{\text{MSL}}$ and ${\theta }_{0}$. By definition, the left hand side equals zero and after multiplying by $\sqrt{N}$ and rearranging we find

(8)$\sqrt{N}\left({\stackrel{^}{\theta }}_{\text{MSL}}-{\theta }_{0}\right)=-{\left[\frac{1}{N}{\Delta }_{\theta }^{2}\stackrel{˜}{l}\left(\overline{\theta }\right)\right]}^{-1}\frac{1}{\sqrt{N}}{\Delta }_{\theta }\stackrel{˜}{l}\left({\theta }_{0}\right).$

Now, the consistency of ${\stackrel{^}{\theta }}_{\text{MSL}}$ implies consistency of $\overline{\theta }$ and so

(9)$\frac{1}{N}{\Delta }_{\theta }^{2}\stackrel{˜}{l}\left(\overline{\theta }\right)\stackrel{p}{\to }E\left[{\Delta }_{\theta }^{2}\mathrm{ln}f\left({\theta }_{0}|y\right)\right].$

As for the gradient term, we have

(10)$\frac{1}{\sqrt{N}}{\Delta }_{\theta }\stackrel{˜}{l}\left({\theta }_{0}\right)=\frac{1}{\sqrt{N}}\sum _{i=1}^{N}\frac{{\Delta }_{\theta }\stackrel{˜}{f}\left({\theta }_{0}|{y}_{i},{\omega }_{i}\right)}{\stackrel{˜}{f}\left({\theta }_{0}|{y}_{i},{\omega }_{i}\right)}.$

Ideally, to prove asymptotic normallity we would like this to converge to some mean zero normal distribution. However, the expectation of the individual terms in this summation are nonzero, so we cannot apply a central limit theorem directly. We can rewrite this term as follows:

(11)$\frac{1}{\sqrt{N}}{\Delta }_{\theta }\stackrel{˜}{l}\left({\theta }_{0}\right)=\frac{1}{\sqrt{N}}{\Delta }_{\theta }l\left({\theta }_{0}\right)+{A}_{N}+{B}_{N}$

with

(12)${A}_{N}=\frac{1}{\sqrt{N}}\sum _{i=1}^{N}\left\{{\Delta }_{\theta }\mathrm{ln}\stackrel{˜}{f}-{E}_{\omega }\left[{\Delta }_{\theta }\mathrm{ln}\stackrel{˜}{f}\right]\right\}$

and

(13)${B}_{N}=\frac{1}{\sqrt{N}}\sum _{i=1}^{N}\left\{{E}_{\omega }\left[{\Delta }_{\theta }\mathrm{ln}\stackrel{˜}{f}\right]-{\Delta }_{\theta }\mathrm{ln}f\right\}.$

The term ${A}_{N}$ represents the pure simulation noise and has expectation zero. The ${B}_{N}$ term represents the simulation bias. Proposition 4 of Hajivassiliou and Ruud (1994, p. 2418) shows that if $R$ grows fast enough relative to $N$, specifically if $R/\sqrt{N}\to \infty$, then the simulation bias is harmless. Finally, Proposition 5 (p. 2419) shows that ${\stackrel{^}{\theta }}_{\text{MSL}}$ is in fact asymptotically efficient.