Search EdWorkingPapers

Search for EdWorkingPapers here by author, title, or keywords.

Methodology, measurement and data

Matthew Kraft.

Researchers commonly interpret effect sizes by applying benchmarks proposed by Cohen over a half century ago. However, effects that are small by Cohen’s standards are large relative to the impacts of most field-based interventions. These benchmarks also fail to consider important differences in study features, program costs, and scalability. In this paper, I present five broad guidelines for interpreting effect sizes that are applicable across the social sciences. I then propose a more structured schema with new empirical benchmarks for interpreting a specific class of studies: causal research on education interventions with standardized achievement outcomes. Together, these tools provide a practical approach for incorporating study features, cost, and scalability into the process of interpreting the policy importance of effect sizes.

More →

Download PDF808.62 KB

Zachary Mabel, CJ Libassi, Michael Hurwitz.

Policymakers are increasingly including early-career earnings data in consumer-facing college search tools to help students and families make more informed post-secondary education decisions. We offer new evidence on the degree to which existing college-specific earnings data equips consumers with useful information by documenting the level of selection bias in the earnings metrics reported in the U.S. Department of Education’s College Scorecard. Given growing interest in reporting earnings by college and major, we focus on the degree to which earnings differences across four-year colleges and universities can be explained by differences in major composition across institutions. We estimate that more than three-quarters of the variation in median earnings across institutions is explained by observable factors, and accounting for differences in major composition explains over 30 percent of the residual variation in earnings after controlling for institutional selectivity, student composition, and local cost of living differences. We also identify large variations in the distribution of earnings within colleges; as a result, comparisons of early-career earnings can be extremely sensitive to whether the median, 25th, or 75th percentiles are presented. Taken together, our findings indicate that consumers can easily draw misleading conclusions about institutional quality when using publicly available earnings data to compare institutions.

More →


Gary T. Henry, J. Edward Guthrie.

Federal education policies gave political and financial support for state education agencies to turnaround low-performing schools on an unprecedented scale. North Carolina’s ambitious program turned around over half of all schools nationwide that underwent turnaround funded by Race to the Top. Exploiting the assignment to turnaround based on schools’ 2009-10 proficiency rates, we implement  regression discontinuity designs to estimate the effects of state turnaround services on student achievement in North Carolina’s lowest-performing schools annually from the 2011-12 through 2014-15. Overall, we find modest positive effects of turnaround when including treated schools at all grade levels, but these effects are sensitive to bandwidth. For secondary schools, we find consistently positive effects that vary from modest to large. For elementary and middle schools, we find consistent, modest negative effects of turnaround on student achievement.

More →


Drew H. Bailey, Jade M. Jenkins, Daniela Alvarez-Vargas.

The sustaining environments hypothesis refers to the popular idea, stemming from theories in developmental, cognitive, and educational psychology, that the long-term success of early educational interventions is contingent on the quality of the subsequent learning environment. Several studies have investigated whether specific kindergarten classroom and other elementary school factors account for patterns of persistence and fadeout of early educational interventions. These analyses focus on the statistical interaction between an early educational intervention – usually whether the child attended preschool – and several measures of the quality of the subsequent educational environment. The key prediction of the sustaining environments hypothesis is a positive interaction between these two variables. To quantify the strength of the evidence for such effects, we meta-analyze existing studies that have attempted to estimate interactions between preschool and later educational quality in the United States. We then attempt to establish the consistency of the direction and a plausible range of estimates of the interaction between preschool attendance and subsequent educational quality by using a specification curve analysis in a large, nationally representative dataset that has been used in several recent studies of the sustaining environments hypothesis. Both analyses yield interaction estimates of approximately 0. We consider a range of plausible explanations, including 1) the null hypothesis is true, 2) we did not have statistical power to detect interactions of a realistic magnitude, 3) model misspecification because of theoretical ambiguity, and 4) heterogeneity of these interactions across treatments, contexts, and children. We offer a set of recommendations for future research on the sustaining environments hypothesis.

More →


James Soland.

Research has begun to investigate whether teachers and schools are as effective with certain student subgroups as they are with the overall student population.  Most of this research has examined the issue by trying to produce causal estimates of school contributions to short-term student growth (usually using value-added models) and has emphasized rank orderings of schools by subgroup.  However, not much is known about whether schools contributing to long-term growth for all students are also contributing to student growth by subgroup in ways that might close achievement gaps.  In this study, schools’ contributions to student growth are estimated separately for Black versus White students.  Results show that focusing on rank orderings of schools alone can mask troubling trends in relative achievement over time.  Options for how policymakers can sensibly hold schools accountable for student growth, including under The Every Student Succeeds Act, are discussed.

More →


Min Sun, Jing Liu, Junmeng Zhu, Zachary LeClair.

Although program evaluations using rigorous quasi-experimental or experimental designs can inform decisions about whether to continue or terminate a given program, they often have limited ability to reveal the mechanisms by which complex interventions achieve their effects. To illuminate these mechanisms, this paper analyzes novel text data from thousands of school improvement planning and implementation reports from Washington State, deploying computer-assisted techniques to extract measures of school improvement processes. Our analysis identified 15 coherent reform strategies that varied greatly across schools and over time. The prevalence of identified reform strategies was largely consistent with school leaders’ own perceptions of reform priorities via interviews. Several reform strategies measures were significantly associated with reductions in student chronic absenteeism and improvements in student achievement. We lastly discuss the opportunities and pitfalls of using novel text data to study reform processes.

More →


Thurston Domina, Andrew McEachin, Paul Hanselman, Priyanka Agarwal, NaYoung Hwang, Ryan Lewis.

Schools utilize an array of strategies to match curricula and instruction to students’ heterogeneous skills. While generations of scholars have debated “tracking” and its consequences, the literature fails to account for diversity of school-level sorting practices. In this paper we draw upon the work of Sørenson (1970) to articulate and develop empirical measures of five distinct dimensions of school cross-classroom tracking systems: (1) the degree of course differentiation, (2) the extent to which sorting practices generate skills-homogeneous classrooms, (3) the rate at which students enroll in advanced courses, (4) the extent to which students move between tracks over time, and (5) the relation between track assignments across subject areas. Analyses of longitudinal administrative data following 24,000 8th graders enrolled in 23 middle schools through the 10th grade indicate that these dimensions of tracking are empirically separable and have divergent effects on student achievement and the production of inequality.

More →


James Soland, Yeow Meng Thum.

Effect sizes in the Cohen’s d family are often used in education to compare estimates across studies, measures, and sample sizes.  For example, effect sizes are used to compare gains in achievement students make over time, either in pre- and post-treatment studies or in the absence of intervention, such as when estimating achievement gaps.  However, despite extensive research dating back to the paired t-test literature showing that such growth effect sizes should account for within-person correlations of scores over time, such achievement gains are often standardized relative to the standard deviation from a single timepoint or two timepoints pooled.  Such a tendency likely occurs in part because there are not many large datasets from which a distribution of student- or school-level gains can be derived.  In this study, we present a novel model for estimating student growth in conjunction with a national dataset to show that effect size estimates for student and school growth are often quite different when standardized relative to a distribution of gains rather than static achievement.  In particular, we provide nationally representative empirical benchmarks for student achievement and gains, including for male-female gaps in those gains, and examine the sensitivity of those effect sizes to how they are standardized.  Our results suggest that effect sizes scaled relative to a distribution of gains are less likely to understate the effects of interventions over time, and that resultant effect sizes often more closely match the estimand of interest for most practice, policy, and evaluation questions.

More →


Samantha Viano, Lam Pham, Gary T. Henry, Adam Kho, Ron Zimmer.

Recruiting and retaining teachers can be challenging for many schools, especially in low-performing urban schools in which teachers turn over at higher rates. In this study, we examine three types of school-level attributes that may influence teachers’ decisions to enter or transfer schools: malleable school processes, structural features of employment, and school characteristics. Using adaptive conjoint analysis survey design with a sample of teachers from low-performing, urban, turnaround schools in Tennessee, we find that five of the seven most highly valued features of schools are malleable processes: consistent administrative support, consistent enforcement of discipline, school safety, small class sizes, and availability of high-quality professional development. In particular, teachers rated as effective are more likely to prefer performance-based pay than teachers rated ineffective. We validate our results using administrative data from Tennessee on teachers’ actual mobility patterns.

More →


Eric A. Hanushek, Paul E. Peterson, Laura M. Talpey, Ludger Woessmann.
Concerns about the breadth of the U.S. income distribution and limited intergenerational mobility have led to a focus on educational achievement gaps by socio-economic status (SES).  Uintertemporally linked assessments from NAEP, TIMSS, and PISA, we trace the achievement of U.S. student cohorts born between 1954 and 2001.  Achievement gaps between the top and bottom deciles and the top and bottom quartiles of the SES distribution have been large and remarkably constant for a near half century.  These unwavering gaps have not been offset by overall improvements in achievement levels, which have risen at age 14 but remained unchanged at age 17 for the most recent quarter century.  The long-term failure of major educational policies to alter SES gaps suggests a need to reconsider standard approaches to mitigating disparities.

More →