Search EdWorkingPapers

Search for EdWorkingPapers here by author, title, or keywords.

Methodology, measurement and data

Isaac M. Opper.

Researchers often include covariates when they analyze the results of randomized controlled trials (RCTs), valuing the increased precision of the estimates over the potential of inducing small-sample bias when doing so. In this paper, we develop a sufficient condition which ensures that the inclusion of covariates does not cause small-sample bias in the effect estimates. Using this result as a building block, we develop a novel approach that uses machine learning techniques to reduce the variance of the average treatment effect estimates while guaranteeing that the effect estimates remain unbiased. The framework also highlights how researchers can use data from outside the study sample to improve the precision of the treatment effect estimate by using the auxiliary data to better model the relationship between the covariates and the outcomes. We conclude with a simulation, which highlights the value of using the proposed approach.

More →


James D. Paul, Patrick J. Wolf.

Virtual charter schools provide full-time, tuition-free K-12 education through internet-based instruction. Although virtual schools offer a personalized learning experience, most research suggests these schools are negatively associated with achievement. Few studies account for differential rates of student mobility, which may produce biased estimates if mobility is jointly associated with virtual school enrollment and subsequent test scores. We evaluate the effects of a single, large, anonymous virtual charter school on student achievement using a hybrid of exact and nearest-neighbor propensity score matching. Relative to their matched peers, we estimate that virtual students produce marginally worse ELA scores and significantly worse math scores after one year. When controlling for student mobility during the outcome year, estimates of virtual schooling are slightly less negative. These findings may be more reliable indicators of the independent effect of virtual schooling if matching on mobility proxies for otherwise unobservable negative selection factors.

More →


C. Kirabo Jackson, Shanette C. Porter, John Q. Easton, Sebastian Kiguel.

We estimate the longer-run effects of attending an effective high school (one that improves a combination of test scores, survey measures of socio-emotional development, and behaviours in 9th grade) for students who are more versus less educationally advantaged (i.e., likely to attain more years of education based on 8th-grade characteristics). All students benefit from attending effective schools. However, the least advantaged students experience the largest improvements in high-school graduation, college-going, and school-based arrests. These patterns are driven by the least advantaged students benefiting the most from school impacts on the non-test-score dimensions of school quality. However, while there is considerable overlap in the effectiveness of schools attended by more and less advantaged students, it is the most advantaged students that are most likely to attend highly effective schools. These patterns underscore the importance of quality schools, and the non-test score components of quality schools, for improving the longer-run outcomes for less advantaged students.

More →


Paul T. von Hippel, Laura Bellows.

At least sixteen US states have taken steps toward holding teacher preparation programs (TPPs) accountable for teacher value-added to student test scores. Yet it is unclear whether teacher quality differences between TPPs are large enough to make an accountability system worthwhile. Several statistical practices can make differences between TPPs appear larger and more significant than they are. We reanalyze TPP evaluations from 6 states—New York, Louisiana, Missouri, Washington, Texas, and Florida—using appropriate methods implemented by our new caterpillar command for Stata. Our results show that teacher quality differences between most TPPs are negligible—.01-.03 standard deviations in student test scores—even in states where larger differences were reported previously. While ranking all a state’s TPPs may not be possible or desirable, in some states and subjects we can find a single TPP whose teachers stand out as significantly above or below average. Such exceptional TPPs may reward further study.

More →


George Bulman, Robert Fairlie, Sarena Goodman, Adam Isen.

We examine U.S. children whose parents won the lottery to trace out the effect of financial resources on college attendance. The analysis leverages federal tax and financial aid records and substantial variation in win size and timing. While per-dollar effects are modest, the relationship is weakly concave, with a high upper bound for amounts greatly exceeding college costs. Effects are smaller among low-SES households, not sensitive to how early in adolescence the shock occurs, and not moderated by financial aid crowd-out. The results imply that households derive consumption value from college and household financial constraints alone do not inhibit attendance.

More →


David M. Quinn, Andrew D. Ho.

The estimation of test score “gaps” and gap trends plays an important role in monitoring educational inequality. Researchers decompose gaps and gap changes into within- and between-school portions to generate evidence on the role schools play in shaping these inequalities. However, existing decomposition methods assume an equal-interval test scale and are a poor fit to coarsened data such as proficiency categories. This leaves many potential data sources ill-suited for decomposition applications. We develop two decomposition approaches that overcome these limitations: an extension of V, an ordinal gap statistic, and an extension of ordered probit models. Simulations show V decompositions have negligible bias with small within-school samples. Ordered probit decompositions have negligible bias with large within-school samples but more serious bias with small within-school samples. More broadly, our methods enable analysts to (1) decompose the difference between two groups on any ordinal outcome into portions within- and between some third categorical variable, and (2) estimate scale-invariant between-group differences that adjust for a categorical covariate.        

More →


Todd R. Jones, Daniel Kreisman, Ross Rubenstein, Cynthia Searcy, Rachana Bhatt.

For years Georgia's HOPE Scholarship program provided full tuition scholarships to high achieving students. State budgetary shortfalls reduced its generosity in 2011. Under the new rules, only students meeting more rigorous merit-based criteria would retain the original scholarship covering full tuition, now called Zell Miller, with other students seeing aid reductions of approximately 15 percent. We exploit the fact that two of the criteria were high school GPA and SAT/ACT score, which students could not manipulate when the change took place. We compare already-enrolled students just above and below these cutoffs, making use of advances in multi-dimensional regression discontinuity, to estimate effects of partial aid loss. We show that, after the changes, aid flowed disproportionately to wealthier students, and find no evidence that the financial aid reduction affected persistence or graduation for these students. The results suggest that high-achieving students, particularly those already in college, may be less price sensitive than their peers.

More →


Katharine Meyer, Kelli A. Bird, Benjamin L. Castleman.
With rapid technological transformations to the labor market along with COVID-19 related economic disruptions, many working adults return to college to obtain additional training or credentials. Using a comparative individual fixed effects strategy and an administrative panel dataset of enrollment and employment in Virginia, we provide the first causal estimates of credential “stacking” among working adults. We find stacking increases employment by four percentage points and quarterly wages by $570 (seven percent). Returns are larger for individuals whose first credential is a short-term certificate and in Health and Business.

More →


Vivian C. Wong, Peter M. Steiner, Kylie L. Anglin.

Recent interest to promote and support replication efforts assume that there is well-established methodological guidance for designing and implementing these studies. However, no such consensus exists in the methodology literature. This article addresses these challenges by describing design-based approaches for planning systematic replication studies. Our general approach is derived from the Causal Replication Framework (CRF), which formalizes the assumptions under which replication success can be expected. The assumptions may be understood broadly as replication design requirements and individual study design requirements. Replication failure occurs when one or more CRF assumptions are violated. In design-based approaches to replication, CRF assumptions are systematically tested to evaluate the replicability of effects, as well as to identify sources of effect variation when replication failure is observed. In direct replication designs, replication failure is evidence of bias or incorrect reporting in individual study estimates, while in conceptual replication designs, replication failure occurs because of effect variation due to differences in treatments, outcomes, settings, and participant characteristics. The paper demonstrates how multiple research designs may be combined in systematic replication studies, as well as how diagnostic measures may be used to assess the extent to which CRF assumptions are met in field settings.    

More →


Kylie L. Anglin, Vivian C. Wong.

Researchers are rarely satisfied to learn only whether an intervention works, they also want to understand why and under what circumstances interventions produce their intended effects. These questions have led to increasing calls for implementation research to be included in high quality studies with strong causal claims. Of critical importance is determining whether an intervention can be delivered with adherence to a standardized protocol, and the extent to which an intervention protocol can be replicated across sessions, sites, and studies. When an intervention protocol is highly standardized and delivered through verbal interactions with participants, a set of natural language processing (NLP) techniques termed semantic similarity can be used to provide quantitative summary measures of how closely intervention sessions adhere to a standardized protocol, as well as how consistently the protocol is replicated across sessions. Given the intense methodological, budgetary and logistical challenges for conducting implementation research, semantic similarity approaches have the benefit of being low-cost, scalable, and context agnostic for use. In this paper, we demonstrate how semantic similarity approaches may be utilized in an experimental evaluation of a coaching protocol on teacher pedagogical skills in a simulated classroom environment. We discuss strengths and limitations of the approach, and the most appropriate contexts for applying this method.

More →