# Pesendorfer and Schmidt-Dengler (2008)

Asymptotic Least Squares Estimators for Dynamic Games

These notes are based on the following article:

Pesendorfer, Martin and Philipp Schmidt-Dengler (2008). Asymptotic Least Squares Estimators for Dynamic Games. Review of Economic Studies 75, 901–928.

Presentation by Jason Blevins, Duke University Applied Microeconomics Reading Group, June 11, 2008.

## Outline

• Considers the class of asymptotic least squares estimators for dynamic games.
• Estimation is based on equilibrium conditions.
• Discuss identification and provide sufficient conditions for exact identification.
• Characterize the efficient asymptotic least squares estimator.
• Several well-known estimators are members of this class.
• Monte Carlo experiments.

## Framework

• Dynamic games in discrete time with $t=1,\dots ,\infty$.
• $N$ players, $K+1$ actions, $L$ states per player, common discount factor $\beta$.
• States:
• ${s}_{i,t}\in {S}_{i}=\left\{1,\dots ,L\right\}$
• ${\epsilon }_{i,t}\sim F\left(\epsilon \mid {s}_{i,t},{s}_{-i,t}\right)$ on ${ℝ}^{K}$.
• Let $S={S}_{1}×\dots ×{S}_{N}$.
• The payoff shocks ${\epsilon }_{i,t}$ are private information, independent across players and time, and independent of the actions of other players.
• Actions ${a}_{i,t}\in {A}_{i}=\left\{0,1,\dots ,K\right\}$ are made simultaneously. Let $A={A}_{1}×\dots ×{A}_{N}$.
• State transitions follow some density $g\left({a}_{t},{s}_{t},{s}_{t+1}\right)$. Let $G$ denote the ${m}_{a}{m}_{s}×{m}_{s}$ matrix of these probabilities where ${m}_{s}=#S={L}^{N}$ and ${m}_{a}=#A=\left(K+1{\right)}^{N}$.
• Period payoffs are given by ${\pi }_{i}\left({a}_{t},{s}_{t}\right)+\sum _{k=1}^{K}{\epsilon }_{i,t,k}1\left\{{a}_{i,t}=k\right\}$

## Equilibrium Characterization

The continuation value net of payoff shocks under ${a}_{i}$ with beliefs ${\sigma }_{i}$ is ${u}_{i}\left({a}_{i};{\sigma }_{i},\theta \right)=\sum _{{a}_{-i}}{\sigma }_{i}\left({a}_{-i}\mid s\right)\left[{\pi }_{i}\left({a}_{-i},{a}_{i},s\right)+\beta \sum _{s\prime }g\left({a}_{-i},{a}_{i},s,s\prime \right){V}_{i}\left(s\prime ;{\sigma }_{i}\right)\right].$ It is optimal to choose ${a}_{i}$ under the beliefs ${\sigma }_{i}$ if ${u}_{i}\left({a}_{i};{\sigma }_{i},\theta \right)+{\epsilon }_{i,{a}_{i}}\ge {u}_{i}\left({a}_{i}\prime ;{\sigma }_{i},\theta \right)+{\epsilon }_{i,{a}_{i}\prime }\phantom{\rule{1em}{0ex}}\forall {a}_{i}\prime \in {A}_{i}.$

Ex ante, in expectation we have $p\left({a}_{i}\mid s,{\sigma }_{i}\right)={\Psi }_{i}\left({a}_{i},s,{\sigma }_{i};\theta \right)=\int 1\left\{{u}_{i}\left({a}_{i};{\sigma }_{i},\theta \right)-{u}_{i}\left(k;{\sigma }_{i},\theta \right)\ge {\epsilon }_{i,k}-{\epsilon }_{i,{a}_{i}},k\ne {a}_{i}\right\}\phantom{\rule{thinmathspace}{0ex}}\mathrm{dF}.$ In matrix notation we have a $\left(N\cdot K\cdot {m}_{s}\right)×1$ system $p=\Psi \left(\sigma ;\theta \right).$

## Equilibrium Properties

In equilibrium, beliefs are consistent and we have the fixed point problem $label{\mathrm{fixed}}_{\mathrm{point}}p=\Psi \left(p;\theta \right).$ Thus, finding an equilibrium is a fixed point problem on $\left[0,1{\right]}^{N\cdot K\cdot {m}_{s}}$.

Proposition: In any Markov perfect equilibrium, the probability vector $p$ satisfies \eqref{fixed_point}. Conversely, any $p$ that satisfies \eqref{fixed_point} can be extended to a Markov perfect equilibrium.

Theorem: A Markov perfect equilibrium exists.

We have the same results under symmetric equilibria: existence and necessary and sufficient conditions. Symmetry reduces the number of equations in \eqref{fixed_point} and thus the computational complexity.

## Identification

The model is identified if there exists a unique set of model primitives $\left({\Pi }_{i},\dots ,{\Pi }_{N},F,\beta ,g\right)$ that generate any particular set of choice and state transition probabilities.

• Time series data $\left\{{a}_{t},{s}_{t}{\right\}}_{t=1}^{T}$.
• Suppose the data allow us to characterize $p\left(a\mid s\right)$ and $g\left(a,s,s\prime \right)$.
• Fix $\beta$ and $F$.
• There are ${m}_{a}\cdot {m}_{s}\cdot N$ remaining unknowns in $\left({\Pi }_{1},\dots ,{\Pi }_{N}\right)$.

Proposition: Suppose $F$ and $\beta$ are given. Then at most $K\cdot {m}_{s}\cdot N$ parameters can be identified.

There are only $K\cdot {m}_{s}\cdot N$ equations in the equilibrium conditions but ${m}_{a}\cdot {m}_{s}\cdot N$ parameters. We need at least $\left({m}_{a}\cdot {m}_{s}-K\cdot {m}_{s}\right)\cdot N$ restrictions in order to identify all parameters.

## Identification: A Linear Representation

There is some ${\overline{\epsilon }}_{i}^{{a}_{i}}\left(s\right)$ that makes player $i$ indifferent between actions ${a}_{i}$ and $0$: $\begin{array}{rl}& \sum _{{a}_{-i}\in {A}_{-i}}p\left({a}_{-i}\mid s\right)\left[{\pi }_{i}\left({a}_{-i},{a}_{i},s\right)+\beta \sum _{s\prime \in S}g\left({a}_{-i},{a}_{i},s,s\prime \right){V}_{i}\left(s\prime ;p\right)\right]+{\overline{\epsilon }}_{i}^{{a}_{i}}\left(s\right)\\ & =\sum _{{a}_{-i}\in {A}_{-i}}p\left({a}_{-i}\mid s\right)\left[{\pi }_{i}\left({a}_{-i},0,s\right)+\beta \sum _{s\prime \in S}g\left({a}_{-i},0,s,s\prime \right){V}_{i}\left(s\prime ;p\right)\right]\end{array}$

From before, ${V}_{i}\left({\sigma }_{i}\right)=\left[I-\beta {\sigma }_{i}G{\right]}^{-1}\left[{\sigma }_{i}{\Pi }_{i}+{D}_{i}\left({\sigma }_{i}\right)\right]$. Thus, we have a linear system of equations for player $i$: ${X}_{i}\left(p,g,\beta \right){\Pi }_{i}+{Y}_{i}\left(p,g,\beta \right)=0$ where ${X}_{i}$ is a $\left(K\cdot {m}_{s}\right)×\left({m}_{a}\cdot {m}_{s}\right)$ matrix and ${Y}_{i}$ is a $\left(K\cdot {m}_{s}\right)×1$ vector, both of which depend on the choice probabilities, transition probabilities, and $\beta$.

## Identification: Linear Restrictions

Consider player $i$. Let ${R}_{i}$ be a $\left({m}_{a}\cdot {m}_{s}-K\cdot {m}_{s}\right)×\left({m}_{a}\cdot {m}_{s}\right)$ matrix of restrictions and let ${r}_{i}$ be a $\left({m}_{a}\cdot {m}_{s}-K\cdot {m}_{s}\right)×1$-dimensional vector such that ${R}_{i}{\Pi }_{i}={r}_{i}$.

We can now form an augmented linear system of ${m}_{a}\cdot {m}_{s}$ equations in ${m}_{a}\cdot {m}_{s}$ unknowns (hence the order condition is satisfied): $\left[\begin{array}{c}{X}_{i}\\ {R}_{i}\end{array}\right]{\Pi }_{i}+\left[\begin{array}{c}{Y}_{i}\\ {r}_{i}\end{array}\right]={\overline{X}}_{i}{\Pi }_{i}+{\overline{Y}}_{i}=0.$

Proposition: Consider any player $i$ and suppose that $F$ and $\beta$ are given. If $rank\left({\overline{X}}_{i}\right)={m}_{a}\cdot {m}_{s}$, then ${\Pi }_{i}$ is exactly identified.

Example: Consider the following restrictions: $\begin{array}{rlrl}{\pi }_{i}\left({a}_{i},{a}_{-i},{s}_{i},{s}_{-i}\right)& ={\pi }_{i}\left({a}_{i},{a}_{-i},{s}_{i},{s}_{-i}\prime \right)& & \phantom{\rule{1em}{0ex}}\forall a\in A,\left({s}_{i},{s}_{-i}\right)\in S,\left({s}_{i},{s}_{-i}\prime \right)\in S\\ {\pi }_{i}\left(0,{a}_{-i},{s}_{i}\right)& ={r}_{i}\left({a}_{-i},{s}_{i}\right)& & \phantom{\rule{1em}{0ex}}\forall {a}_{-i}\in {A}_{-i},{s}_{i}\in {S}_{i}\end{array}$ The first is an exclusion restriction while the second is an exogeneity restriction (e.g., payoffs for inactive firms are known to be zero). If $L\ge K+1$, then these restrictions ensure identification (provided that the rank condition holds).

## Asymptotic Least Squares Estimators

Let $\theta =\left({\theta }_{\pi },{\theta }_{F},\beta ,{\theta }_{g}\right)\in \Theta \subset {ℝ}^{q}$ be the parameters of interest.

There are also $H\le \left(N\cdot K\cdot {m}_{s}\right)+\left({m}_{a}\cdot {m}_{s}\cdot {m}_{s}\right)$ auxiliary parameters $p\left(\theta \right)$ and $g\left(\theta \right)$, related to $\theta$ through the $N\cdot K\cdot {m}_{s}$ equations $label{\mathrm{estimating}}_{\mathrm{equations}}h\left(p,g,\theta \right)=p-\Psi \left(p,g,\theta \right)=0.$

Asymptotic least squares estimators (Gourieroux and Monfort, 1995, Section 9.1) proceed in two steps:

1. Estimate the auxiliary parameters $p$ and $g$.
2. Estimate the parameters of interest using weighted least squares using \eqref{estimating_equations} as estimating equations.

## Asymptotic Least Squares Estimators

Assume that consistent and asymptotically normal estimators of $p$ and $g$ are available such that as $T\to \infty$, $\begin{array}{c}\left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T}\right)⟶\left(p\left({\theta }_{0}\right),g\left({\theta }_{0}\right)\right)\phantom{\rule{1em}{0ex}}a.s.,\\ \sqrt{T}\left[\left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T}\right)-\left(p\left({\theta }_{0}\right),g\left({\theta }_{0}\right)\right)\right]\stackrel{d}{⟶}Normal\left(0,\Sigma \left({\theta }_{0}\right)\right).\end{array}$

The estimation principle involves choosing $\theta$ in order to satisfy the constraints $h\left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T},\theta \right)={\stackrel{^}{p}}_{T}-\Psi \left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T},\theta \right)=0.$

Let ${W}_{T}$ be a symmetric positive-definite weight matrix of dimension $\left(N\cdot K\cdot {m}_{s}\right)×\left(N×K×{m}_{s}\right)$. The asymptotic least squares estimator corresponding to ${W}_{T}$ is defined as ${\stackrel{˜}{\theta }}_{T}\left({W}_{T}\right)=\mathrm{arg}\underset{\theta }{\mathrm{min}}\left[{\stackrel{^}{p}}_{T}-\Psi \left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T},\theta \right){\right]}^{\top }{W}_{T}\left[{\stackrel{^}{p}}_{T}-\Psi \left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T},\theta \right)\right].$

## Asymptotic Least Squares Estimators: Assumptions

1. $\Theta$ is a compact set.
2. ${\theta }_{0}$ lies in the interior of $\Theta$.
3. As $T\to \infty$, ${W}_{T}\to {W}_{0}$ a.s. where ${W}_{0}$ is a non-stochastic positive definite matrix.
4. $\theta$ satisfies ${\left[p\left({\theta }_{0}\right)-\Psi \left(p\left({\theta }_{0}\right),g\left({\theta }_{0}\right),\theta \right)\right]}^{\top }{W}_{o}\left[p\left({\theta }_{0}\right)-\Psi \left(p\left({\theta }_{0}\right),g\left({\theta }_{0}\right),\theta \right)\right]=0$ implies that $\theta ={\theta }_{0}$.
5. The functions $\pi$, $g$, and $F$ are twice continuously differentiable in $\theta$.
6. The matrix ${\left[{\nabla }_{\theta }\Psi \left(p\left({\theta }_{0}\right),g\left({\theta }_{0}\right),{\theta }_{0}\right)\right]}^{\top }{W}_{o}\left[{\nabla }_{\theta }\Psi \left(p\left({\theta }_{0}\right),g\left({\theta }_{0}\right),{\theta }_{0}\right)\right]$ is nonsingular.

## Asymptotic Least Squares Estimators: Properties

Proposition: Given the assumptions above the asymptotic least squares estimator ${\stackrel{˜}{\theta }}_{T}\left({W}_{T}\right)$ exists, ${\stackrel{˜}{\theta }}_{T}\left({W}_{T}\right)\stackrel{a.s.}{\to }{\theta }_{0}$, and as $T\to 0$, $\sqrt{T}\left({\stackrel{˜}{\theta }}_{T}\left({W}_{T}\right)-{\theta }_{0}\right)\stackrel{d}{\to }Normal\left(0,\Omega \left({\theta }_{0}\right)\right)$ where $\Omega \left({\theta }_{0}\right)={\left({\nabla }_{\theta }{\Psi }^{\top }{W}_{0}{\nabla }_{{\theta }^{\top }}\right)}^{-1}{\nabla }_{\theta }{\Psi }^{\top }{W}_{0}\left[\left(\begin{array}{cc}I& 0\end{array}\right)-{\nabla }_{\left(p,g{\right)}^{\top }}\Psi \right]\Sigma \cdot {\left[\left(\begin{array}{cc}I& 0\end{array}\right)-{\nabla }_{\left(p,g{\right)}^{\top }}\Psi \right]}^{\top }{W}_{0}{\nabla }_{{\theta }^{\top }}\Psi {\left({\nabla }_{\theta }{\Psi }^{\top }{W}_{0}{\nabla }_{{\theta }^{\top }}\right)}^{-1}$ where $0$ is the $\left(N\cdot K\cdot {m}_{s}\right)×\left({m}_{a}\cdot {m}_{s}\cdot {m}_{s}\right)$ zero matrix and the various matrices are evaluated at ${\theta }_{0}$, $p\left({\theta }_{0}\right)$, and $g\left({\theta }_{0}\right)$.

## Efficient Asymptotic Least Squares

Proposition: Under the maintained assumptions, the best asymptotic least squares estimators exist. They correspond to sequences of matrices ${W}_{T}^{\ast }$ converging to ${W}_{0}^{\ast }={\left(\left[\left(\begin{array}{cc}I& 0\end{array}\right)-{\nabla }_{\left(p,g\right)\prime }\Psi \right]\Sigma \left[\left(\begin{array}{cc}I& 0\end{array}\right)-{\nabla }_{\left(p,g\right)\prime }\Psi {\right]}^{\top }\right)}^{-1}.$ Their asymptotic covariance matrices are ${\left({\nabla }_{\theta }{\Psi }^{\top }{\left(\left[\left(\begin{array}{cc}I& 0\end{array}\right)-{\nabla }_{\left(p,g\right)\prime }\Psi \right]\Sigma \left[\left(\begin{array}{cc}I& 0\end{array}\right)-{\nabla }_{\left(p,g\right)\prime }\Psi {\right]}^{\top }\right)}^{-1}{\nabla }_{{\theta }^{\top }}\Psi \right)}^{-1}$

Here, $0$ denotes a $\left(N\cdot K\cdot {m}_{s}\right)×\left({m}_{a}\cdot {m}_{s}\cdot {m}_{s}\right)$ matrix of zeros.

## Asymptotic Least Squares: Moment Estimator

The moment estimator proposed by Hotz and Miller (1993) is an asymptotic least squares estimator with a particular weight matrix.

Let ${T}_{\mathrm{is}}$ denote the set of observations for individual $i$ in state $s$ and let ${\alpha }_{\mathrm{is}}=\left({\alpha }_{1},\dots ,{\alpha }_{K}\right)$ be a vector of indicators for each choice (with zero omitted).

The moment condition is $E\left[Z\otimes \left({\alpha }_{\mathrm{is}}-{\Psi }_{\mathrm{is}}\left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T},\theta \right)\right)\right]=0$ where $Z$ is a $J×1$-dimensional vector of instruments.

Suppose ${Z}_{t}={Z}_{\mathrm{is}}$. Then the corresponding sample analog becomes $\frac{1}{\mathrm{NT}}\sum _{\stackrel{1\le i\le N}{s\in S}}\sum _{t\in {T}_{\mathrm{is}}}{Z}_{t}\otimes \left({\alpha }_{t}-{\Psi }_{\mathrm{is}}\left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T},\theta \right)\right)=\frac{1}{\mathrm{NT}}\sum _{\stackrel{1\le i\le N}{s\in S}}{n}_{\mathrm{is}}\left[{Z}_{\mathrm{is}}\otimes \left({\stackrel{^}{p}}_{\mathrm{is}}-{\Psi }_{\mathrm{is}}\left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T},\theta \right)\right)\right].$

Thus, the moment estimator in this case is an asymptotic least squares estimator with estimating equation $\stackrel{^}{p}-\Psi \left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T},\theta \right)=0$.

## Asymptotic Least Squares: Pseudo Maximum Likelihood

The pseudo maximum likelihood estimator of Aguirregabiria and Mira (2002, 2007) is also an asymptotic least squares estimator.

The partial pseudo log-likelihood, conditional on estimates ${\stackrel{^}{g}}_{T}$ is $\ell =\sum _{s\in S}\sum _{i=1}^{N}\sum _{k\in {A}_{i}}{n}_{\mathrm{kis}}\mathrm{ln}{\Psi }_{\mathrm{kis}}\left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T},\theta \right).$

The first order condition is $\frac{\partial \ell }{\partial \theta }=\left({\nabla }_{\theta }{\Psi }^{\top }\right){\Sigma }_{p}^{-1}\left(\Psi \right)\left[\stackrel{^}{p}-\Psi \left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T},\theta \right)\right]$ where ${\Sigma }_{p}^{-1}\left(\Psi \right)$ is the inverse covariance matrix of the choice probabilities.

This is equivalent to the first order condition of the asymptotic least squares estimator with weight matrix ${W}_{T}^{\mathrm{ml}}\stackrel{p}{\to }{\Sigma }_{p}^{-1}$.

## Monte Carlo Study

• Compare LS-E, PML, LS-I, and k-PML.
• A simple two player, two action, two state, game with five equilibria.
• Three equilibria are used for experiments with various sample sizes.
• LS-E estimator performs best overall (in eight of 12 experiments).
• LS-E performs poorly with the smallest sample size ($T=100$).
• PML ranks second (by MSE) in seven of 12 specifications.
• PML performs better than LS-E for $T=100$ and worse for larger sample sizes. This may be because the covariance matrix of $\left({\stackrel{^}{p}}_{T},{\stackrel{^}{g}}_{T}\right)$ is estimated better than the efficient weight matrix for small $T$.
• PML may be less computationally burdensome for large state spaces.