Graham, Imbens, and Ridder (2006)

Complementarity and Aggregate Implications of Assortative Matching

These slides are based on the working paper “Complementarity and Aggregate Implications of Assortative Matching” by Bryan S. Graham, Guido W. Imbens, and Geert Ridder, May 14, 2006.

Presentation by Jason Blevins, Duke Applied Microeconometrics Reading Group, March 6, 2007.

Basic Model

Reallocation of an indivisible input across firms.
Aggregate stock of input is fixed.
Firm output may be monotone, but at different rates.
Cannot simultaneously increase input level for all firms.

Reallocations

What is the effect on average output of input reallocations?
Marginal distribution of reallocated input remains unchanged.
Average output may change if production technology is nonseparable.

Examples

Teacher reallocation across classrooms of varying mean student ability.
Assignment mechanisms for college roommates in the presence of social interactions.
Effects of spousal sorting on child education.

Estimation

Nonparametrically estimate production function, CDFs of inputs, and quantile functions.
Average over distribution of inputs under new assignment rule.

Comparison

Allow for continuous treatment (rather than binary or discrete).
Assignment policies do not change the marginal distribution of the input in the population.
In treatment effect literature, assignment of treatment is not restricted by the treatment of other units.
Focus on redistributions under specific assignment rules, not optimal assignment rules.

Contributions

Develop a framework for estimating outcomes under correlated matching.
Derive an estimator for average output under correlated matching.
Except for perfect positive and negative rank correlation, the estimator has a parametric rate of convergence.
Derive the asymptotic properties of the estimator in all cases.

Model

$Y_{i} (w)$ output associated with input level $w$ for firm $i$ .
Interested in reallocating input $W$ across firms.
Hold marginal distribution of $W$ fixed.
Observed firm characteristics $X \in ℝ$ , $Z \in ℝ^{K}$ .

Notes:

Holding the marginal distribution of $W$ fixed is appropriate for situations where the input is indivisible (e.g., Teachers, Managers) and when the aggregate stock of the input is hard to augment.
$X$ is a scalar because the paper focuses on rank ordered matching. There is no clear natural ordering for vector valued covariates.

Identifying Assumption

Unconfoundedness/Exogeneity: $Y (w) ⊥ W | X, Z for all w \in 𝒲 \subset ℝ .$
Conditional on firm characteristics $(X, Z)$ , the assignment of $W$ is exogenous.
Example: $\dim (X) = \dim (Z) = 0$ . Then, $Y (w) ⊥ W$ and consequently, $E [Y (w)] = E [Y | W = w]$ .

That is, the average output we would see if all firms were assigned $W = w$ equals the average output among firms that actually have $W = w$ . The distribution of potential outcomes must be the same in the subpopulation of firms that were assigned $W = w$ as that in the overall population. This is the analogous assumption to that of the binary treatment effect model of Rosenbaum and Rubin (1983).

In general, this only has to hold within $(X, Z)$ subpopulations.

Production Function

Define the production function $g (w, x, z) = E [Y | W = w, X = x, Z = z] .$
$g$ denotes average output associated with input levels $(w, x, z)$ .
Under the unconfoundedness assumption, $g (w, x, z) = E [Y (w) | X = x, Z = z] .$

Unconfoundedness implies that among firms with identical $X$ and $Z$ , the (counterfactual) average output of firms if we assigned $W = w$ to all firms is equal to the actual average output of firms that are in fact assigned $W = w$ .

Quantities of Interest

Treatment effect literature has mostly looked at estimating $E_{X, Z} [g (1, X, Z) - g (0, X, Z)] .$
With continuous inputs, we may want to estimate $g (w, x, z)$ or $\frac{\partial g}{\partial w} (w, x, z) .$
This paper is concerned with policies that redistribute an input $W$ following a rule based on $X$ .
What is the average output under such a policy?

Positive Matching

Among units with the same realization of $Z$ , those with the highest values of $X$ receive the highest values of $W$ .
$β^{pam} = E {g [F_{W | Z}^{- 1} (F_{X | Z} (X | Z) | Z), X, Z]} .$
$F_{X | Z} (X | Z)$ denotes the conditional CDF of $X$ given $Z$ .
$F_{W | Z}^{- 1} (q | Z)$ is the quantile of order $q$ of the conditional distribution of $X$ given $Z$ .
We would expect this redistribution to perform well if there is complementarity between $X$ and $W$ .
$F_{W | Z}^{- 1} (q | Z)$ is a conditional quantile function.

Graphical Example

Interpretation

Thus, $F_{W | Z}^{- 1} (F_{X | Z} (X | Z) | Z)$ takes a unit’s position in the distribution of $X | Z$ and assigns to it a value of $W$ corresponding to that quantile.
Consider instead the population-wide redistribution $β^{pam2} = E {g [F_{W}^{- 1} (F_{X} (X)), X, Z]} .$
Total effect is hard to interpret: complementarity or substitutability between $W$ and $X$ is mixed with correlation between $W$ and $Z$ .
The authors focus on redistributions within subpopulations defined by $Z$ because they reflect solely the complementarity or substitutability between $W$ and $X$ . Population-wide redistributions confound these effects by altering the joint distribution of $W$ and $Z$ .
However, there is a distinction to be made between what redistribution might help us learn about complementarity and which might be optimal socially.

An Example

$W$ : teacher quality.
$X$ : mean beginning-of-year achievement.
$Z$ : fraction of class that is female.
Suppose achievement varies with gender ( $X$ and $Z$ correlated).
Positive assortative matching: high-quality teachers assigned to high-achievement classrooms.
Alters joint distributions of $W$ and $X$ as well as $W$ and $Z$ .

An Example

This also tends to assign good teachers to classrooms with a large fraction of female students.
Subsequent increases in achievement may reflect complementarity between $W$ and $X$ .
May also reflect how changes in teacher quality changes with gender.
Conditional on gender, there may be no complementarity at all!
Thus, focusing on redistributions across classrooms with similar gender mixes allows us to learn about complementarity.

Negative Matching

$β^{nam} = E {g [F_{W | Z}^{- 1} (1 - F_{X | Z} (X | Z) | Z), X, Z]} .$
Example: assign best teachers to low-achievement classrooms.

Estimation

Note that although the model was developed for $Z$ subpopulations, only population-wide estimators are presented.
We can estimate $β^{pam}$ and $β^{nam}$ only at nonparametric rates.
These estimators follow the analogy principle.
Status quo: $β^{sq} = N^{- 1} \sum_{i = 1}^{N} Y_{i}$
For others, need $\hat{g}$ , ${\hat{F}}_{X}$ , ${\hat{F}}_{W}$ , and ${\hat{F}}_{W}^{- 1}$ .

Estimation of g

Nonparametric Kernel estimation of the production function $g (w, x, z)$ .
Series estimators could also be used.
Kernel $K (u)$ with $u \in R^{K + 2}$ , bandwidth $b$ , $V_{i} = (W_{i}, X_{i}, Z_{i})$ and $v = (w, x, z)$ .

$\hat{g} (w, x, z) = \frac{\sum_{i} Y_{i} K (\frac{v - V_{i}}{b})}{\sum_{i} K (\frac{v - V_{i}}{b})}$

Support problems: we are trying to learn about a counterfactual allocation that may involve areas of the support for which we have few observations to estimate $g$ .

Estimation of CDFs

Use empirical CDFs:

${\hat{F}}_{X} (x) = N^{- 1} \sum_{i} 1 (X_{i} \leq x)$ ${\hat{F}}_{W} (w) = N^{- 1} \sum_{i} 1 (W_{i} \leq w)$

Quantile function $F_{W}^{- 1} (q)$ :

${\hat{F}}_{W}^{- 1} (q) = \inf_{w \in 𝒲} 1 {{\hat{F}}_{W} (w) \geq q}$

This is the inverse of the empirical CDF of

W

Estimation

Estimate $β^{pam}$ and $β^{nam}$ by analogy:

${\hat{β}}^{pam} = \frac{1}{N} \sum_{i = 1}^{N} \hat{g} [{\hat{F}}_{W}^{- 1} ({\hat{F}}_{X} (X_{i})), X_{i}, Z_{i}]$

${\hat{β}}^{nam} = \frac{1}{N} \sum_{i = 1}^{N} \hat{g} [{\hat{F}}_{W}^{- 1} (1 - {\hat{F}}_{X} (X_{i})), X_{i}, Z_{i}]$

Note that here we are averaging over both $X$ and $Z$ .
The rate of convergence of ${\hat{β}}^{pam}$ and ${\hat{β}}^{nam}$ is slower than the parametric rate.
Loosely speaking, this is because we estimate a nonparametric function $g (w, x, z)$ with more parameters than we then average over.

Correlated Matching

We have four focal allocations:
Perfect positive assortative matching,
Perfect negative assortative matching,
The status quo,
Random matching.
Random matching occurs when $W$ and $X$ are independently assigned within subpopulations.
Consider a subset of the set of all feasible allocations.
Two-parameter subset which has the above as special cases.
Traces paths between the four focal allocations.

Correlated Matching

$β^{cm} (ρ, τ)$
$τ \in [0, 1]$ controls nearness to the status quo (at 1).
$ρ \in [- 1, 1]$ controls nearness to perfect negative or positive allocative matching.
Focal allocations
$β^{sq} = β^{cm} (ρ, 1)$
$β^{rm} = β^{cm} (0, 0)$
$β^{pam} = β^{cm} (1, 0)$
$β^{nam} = β^{cm} (- 1, 0)$

Correlated Matching

Status quo allocation: $β^{sq} = E [Y] = E [g (W, X, Z)] .$
Random matching allocation: $β^{rm} = \int [\int \int g (w, z, z) {dF}_{W | Z} (w | z) {dF}_{X | Z} (x | z)] {dF}_{Z} (z) .$

Normal Copula

Redefine matching allocations using a truncated bivariate standard Normal copula:

$ϕ (x_{1}, x_{2}, ρ) = \frac{1}{2 π \sqrt{1 - ρ^{2}}} \exp [- \frac{1}{2 (1 - ρ^{2})} (x_{1}^{2} - 2 ρ x_{1} x_{2} + x_{2}^{2})],$

$ϕ_{c} (x_{1}, x_{2}, ρ) = \frac{ϕ (x_{1}, x_{2}, ρ)}{Φ (c, c, ρ) - Φ (c, - c, ρ) - [Φ (- c, c, ρ) - Φ (- c, - c, ρ)]} .$

The possible joint CDFs in this class are parametrized by $ρ$

$H_{W, X} (w, x) = Φ_{c} [Φ_{c}^{- 1} (F_{W} (w)), Φ_{c}^{- 1} (F_{X} (x)); ρ] .$

Normal Copula

Marginal CDFs: $H_{W, X} (w, ∞) = F_{W} (w)$ , $H_{W, X} (∞, x) = F_{X} (x)$ .
Special case: independent $X$ and $W$ when $ρ = 0$ .
Joint PDF:

$h_{W, X} (w, z) = ϕ_{c} [Φ_{c}^{- 1} (F_{W} (w)), Φ_{c}^{- 1} (F_{X} (x)); ρ] \frac{f_{W} (w) f_{W} (x)}{ϕ_{c} [Φ_{c}^{- 1} (F_{W} (w))] ϕ_{c} [Φ_{c}^{- 1} (F_{X} (x))]}$

Correlated Matching

$τ$ denotes distance from the status quo.

$\begin{aligned} β^{cm} (ρ, τ) & = τ E [Y] \\ + (1 - τ) \int g (w, x, z) d Φ (Φ^{- 1} (F_{W | Z} (w | z)), Φ^{- 1} (F_{X | Z} (x | z)); ρ) F_{Z} (z) . \end{aligned}$

$β^{cm} (ρ, 0) = \int \int \int g (w, x, z) \frac{ϕ_{c} [Φ_{c}^{- 1} (F_{W} (w)), Φ_{c}^{- 1} (F_{X} (x)); ρ]}{ϕ_{c} [Φ_{c}^{- 1} (F_{W} (w))] ϕ_{c} [Φ_{c}^{- 1} (F_{X} (x))]} f_{W} (w) f_{X, Z} (x, z) dw dx dz$

$β^{rm} = β^{cm} (0, 0) = \int [\int \int g (w, z, z) {dF}_{W | Z} (w | z) {dF}_{X | Z} (x | z)] {dF}_{Z} (z) .$

Estimation

Analog estimator for $β^{cm} (ρ, 0)$ :

${\hat{β}}^{cm} (ρ, 0) = \frac{1}{N^{2}} \sum_{i = 1}^{N} \sum_{j = 1}^{N} \hat{g} (W_{i}, X_{j}, Z_{j}) \frac{ϕ_{c} [Φ_{c}^{- 1} ({\hat{F}}_{W} (W_{i})), Φ_{c}^{- 1} ({\hat{F}}_{X} (X_{j})); ρ]}{ϕ_{c} [Φ_{c}^{- 1} ({\hat{F}}_{W} (W_{i}))] ϕ_{c} [Φ_{c}^{- 1} ({\hat{F}}_{X} (X_{i}))]}$

Under random matching ( $ρ = 0$ ), the densities on the right hand side cancel out, leaving ${\hat{β}}^{rm}$ .
For $τ > 0$ , we have the convex combination

${\hat{β}}^{cm} (ρ, τ) = τ {\hat{β}}^{sq} + (1 - τ) {\hat{β}}^{cm} (ρ, 0) .$

This is linear in the nonparametric regression function $\hat{g}$ and nonlinear in the empirical CDFs of $X$ and $W$ .
The authors claim that under certain conditions, this estimator is consistent and asymptotically normal, but the proofs are omitted in the latest available version (May 2006).

Application

Effects of parents’ education on education of child.
Data: 10,272 children from the National Longitudinal Survey of Youth (NLSY).
Simple model with three variables:
Mother’s education
Father’s education
Child’s education

Summary Statistics

Variable	Mean	Std. dev.
Ed. child	13.06	2.38
Ed. mother	11.20	2.87
Ed. father	11.20	3.64

Regression

Variable	Coefficient	Std. Err.
Constant	11.2700	0.1900
Ed. mother	-0.0410	0.0360
Ed. father	-0.0770	0.0290
Ed. mother²	0.0110	0.0023
Ed. father²	0.0110	0.0015
Ed. mother × Ed. father	0.0014	0.0029

Nonlinearity of the relationship suggests that education of the child might be sensitive to reallocations.
Inspection of the data reveal that there is an asymmetry: it is better to have a mother with high education and a father with low education than vice-versa. The interaction term here doesn’t capture that.

Estimates

$ρ$	${\hat{β}}_{cs}$	$std ({\hat{β}}_{cs})$
-0.99	11.5	.069
-0.80	11.7	.048
-0.60	11.9	.040
-0.40	12.1	.037
-0.20	12.4	.034
0.00	12.6	.033
0.20	12.8	.031
0.40	12.9	.030
0.60	13.0	.029
0.80	13.0	.029
0.99	13.1	.039