Kernel matching defines

where Gik = G((XZ — X^)/aNo) is a kernel that downweights distant observations and aNo is a sequence of smoothing parameters with the property that lim aN =0. Nonzero values of this weight implicitly define C(Xi) for this version of matching. In Section 5 of this paper, we extend kernel matching to permit regression adjustment of outcome equations. To estimate impacts over a set A’ as in (2), form a

weighted sum of (5) over К (6) M(K)= £ шъъШи- Z WNoNl(iJ)y0j}bxXteK, i€{D=1} j€{D=0} where uj^oNl(i) is a weight accounting for scale and possibly heteroskedasticity as well as the choice of support К.

Regression estimators have also been proposed that exploit (A-l), or its implication (4), in a linear regression setting.

# Month: January 2015

# SELECTION BIAS: The Method of Matching

To our knowledge, the method of matching was first used by Fechner (1860). It has been extensively applied to the evaluation of job training programs in studies conducted in the late 705 and early 80s. The method is based on the identifying assumption that



# SELECTION BIAS: The Evaluation Problem 2

Under certain conditions, the parameter of interest can be identified with data from a social experiment. If experiments do not disrupt the program being evaluated, and if control group members do not have access to close substitutes for the experimental treatment, then experimental data identify E(Yq \ X, D = 1). Thus £(Д(Х) | X, D = 1) can be identified for any set of conditioning variables X within the support of X for D = 1 with data from a social experiment.12 When it is valid, randomization avoids all of the traditional econometric problems of model selection. It avoids the need to specify the functional forms of the estimating equations that relate Y\ and Yo to X, or to specify which variables are included in or excluded from outcome equations or program participation equations. This is an important advantage of randomization compared to other evaluation procedures.

# SELECTION BIAS: The Evaluation Problem

Randomization Estimates It

Following Fisher (1935), Roy (1951) and Quandt (1972), we assume that each person has two possible outcomes, Уо and Yi, in the untreated and treated states, respectively. Let D — 1 signify receipt of treatment and D = 0 its absence. General equilibrium effects are ignored so that the outcomes for any person do not depend on the overall level of participation in the program.7

The problem of program evaluation arises because we observe only Yo or Yi for each person, but never both. That is, we observe Y where Y = DY\ + (1 — D)Y0. Thus we cannot form the gross gain A = Y\ — Yo for anyone. In the standard evaluation problem, analysts have access to participant records and to data on a comparison group of nonparticipants. Hence, one can construct the conditional distribution of Yi given a vector of conditioning variables X and D = 1, and the conditional distribution of Y0 given X and D = 0, and can consistently estimate Pr(D = 1 | X) = P(X).8

