Given a sample of observations , the log-likelihood function for an unknown parameter is
Let be an unbiased simulator such that
where is a vector of simulated random variates. Then, the maximum simulated likelihood (MSL) estimator for is
where for some sequence of simulations .
There are two points which deserve special attention. First, the estimator is conditional upon the particular sequence of simulations used. That is to say one will obtain a different estimate for each such sequence used. Second, even though the simulator of is unbiased, the resulting MSL estimate will be biased. That is, even though we have
this does not imply
Unbiased simulation of the log-likelihood function is generally infeasible due to the nonlinearity introduced by the natural log transformation of the likelihood function, which can usually be simulated without bias.
All is not lost because, even though our estimate is biased, we can still obtain an estimator whose probability limit is the same as the MLE. This requires that the sample average of the simulated log-likelihood converges to the sample average log-likelihood. This can be accomplished by increasing the number of simulations, and thus decreasing the simulation error, at a sufficiently fast rate relative to the sample size. We have the following lemma (see Newey and McFadden, 1994):
Lemma. Suppose the following:
- and is compact,
- and are continuous in ,
- is unique,
- , and
- in probability uniformly in as .
Then, in probability.
Now, suppose that satisfies the conditions of this lemma. In particular, suppose that the obersvations are iid, that is identified, and that is continuous in over some compact set . Finally, assume that is finite.
Now, given a sequence of simulators , iid across , the the MSL estimator defined as
is consistent if as . For a proof refer to Hajivassiliou and Ruud (1994, p. 2417).
Suppose that is differentiable in . Then we can form a Taylor expansion approximation of around :
for some lying on the line segment between and . By definition, the left hand side equals zero and after multiplying by and rearranging we find
Now, the consistency of implies consistency of and so
As for the gradient term, we have
Ideally, to prove asymptotic normallity we would like this to converge to some mean zero normal distribution. However, the expectation of the individual terms in this summation are nonzero, so we cannot apply a central limit theorem directly. We can rewrite this term as follows:
The term represents the pure simulation noise and has expectation zero. The term represents the simulation bias. Proposition 4 of Hajivassiliou and Ruud (1994, p. 2418) shows that if grows fast enough relative to , specifically if , then the simulation bias is harmless. Finally, Proposition 5 (p. 2419) shows that is in fact asymptotically efficient.
Hajivassiliou, V. A. and P. A. Ruud (1994). Classical Estimation Methods for LDV Models Using Simulation, in R.F. Engle and D.L. McFadden, eds., Handbook of Econometrics, volume 4. Amsterdam: Elsevier.
Lee, L.-F. (1992). On efficiency of methods of simulated moments and maximum simulated likelihood estimation of discrete response models. Econometric Theory 8, 518–552.
Gouriéroux, C. and A. Monfort (1991). Simulation Based Inference in Models with Heterogeneity, Annales d’Économie et de Statistique 20/21, 69–107.
Newey, W. K. and D. McFadden (1994). Large Sample Estimation and Hypothesis Testing, in R. F. Engle and D. L. McFadden, eds., Handbook of Econometrics, volume 4. Amsterdam: Elsevier.