SELECTION BIAS: Index Sufficient Methods 2

Much applied econometric activity is devoted to eliminating the mean effect of unobservables on estimates of functions like g0 and дг. However, the mean difference in unobservables is an essential component of the definition of the parameter of interest in evaluating social programs.20 In the traditional separable framework, the selection bias that arises from using a nonexperimental comparison group is

The conventional econometric approach partitions the observed variables X into two not necessarily disjoint sets (R,Z) corresponding to those in the outcome equations and those in the participation equation, and postulates exclusion restrictions. Thus it is assumed that certain variables appear in Z but not in R. The conventional approach further restricts the model so that the bias B(X) only depends on Z through a scalar index. Note that exclusion restrictions are neither required nor used to justify matching as an estimator of (1), (2) or (9).21
The latent index variable model with index I motivates the characterization of bias as a function of a scalar index. Define I = H(Z) — v where H(Z) is the mean difference in utilities or discounted earnings between the participation and nonparticipation states and
This is the “index sufficient” representation where P{Z), or equivalently H(Z), is the index. Conventional econometric models (see, e.g., Amemiya, 1985) assume that the latent variables v and Uo are symmetrically distributed around zero, so that B(P(Z)) is symmetric around P — Figure 1 presents an example of a normal selection model. If P itself is symmetrically distributed around P = the average bias over symmetric intervals around that value is zero even though the pointwise bias is nonzero. Thus, the classical selection model sometimes justifies matching as a consistent estimator of parameter (2) over intervals of P where the bias cancels out. To test the index sufficient model, we use our pooled sample of controls and comparison group members to determine if the estimated bias is solely a function of P(Z) for different sets of variables Z, or if a more general conditioning set (R, Z) is required to characterize the bias.