Search for EdWorkingPapers here by author, title, or keywords.
Methodology, measurement and data
Schools utilize an array of strategies to match curricula and instruction to students’ heterogeneous skills. While generations of scholars have debated “tracking” and its consequences, the literature fails to account for diversity of school-level sorting practices. In this paper we draw upon the work of Sørenson (1970) to articulate and develop empirical measures of five distinct dimensions of school cross-classroom tracking systems: (1) the degree of course differentiation, (2) the extent to which sorting practices generate skills-homogeneous classrooms, (3) the rate at which students enroll in advanced courses, (4) the extent to which students move between tracks over time, and (5) the relation between track assignments across subject areas. Analyses of longitudinal administrative data following 24,000 8th graders enrolled in 23 middle schools through the 10th grade indicate that these dimensions of tracking are empirically separable and have divergent effects on student achievement and the production of inequality.
Effect sizes in the Cohen’s d family are often used in education to compare estimates across studies, measures, and sample sizes. For example, effect sizes are used to compare gains in achievement students make over time, either in pre- and post-treatment studies or in the absence of intervention, such as when estimating achievement gaps. However, despite extensive research dating back to the paired t-test literature showing that such growth effect sizes should account for within-person correlations of scores over time, such achievement gains are often standardized relative to the standard deviation from a single timepoint or two timepoints pooled. Such a tendency likely occurs in part because there are not many large datasets from which a distribution of student- or school-level gains can be derived. In this study, we present a novel model for estimating student growth in conjunction with a national dataset to show that effect size estimates for student and school growth are often quite different when standardized relative to a distribution of gains rather than static achievement. In particular, we provide nationally representative empirical benchmarks for student achievement and gains, including for male-female gaps in those gains, and examine the sensitivity of those effect sizes to how they are standardized. Our results suggest that effect sizes scaled relative to a distribution of gains are less likely to understate the effects of interventions over time, and that resultant effect sizes often more closely match the estimand of interest for most practice, policy, and evaluation questions.
Recruiting and retaining teachers can be challenging for many schools, especially in low-performing urban schools in which teachers turn over at higher rates. In this study, we examine three types of school-level attributes that may influence teachers’ decisions to enter or transfer schools: malleable school processes, structural features of employment, and school characteristics. Using adaptive conjoint analysis survey design with a sample of teachers from low-performing, urban, turnaround schools in Tennessee, we find that five of the seven most highly valued features of schools are malleable processes: consistent administrative support, consistent enforcement of discipline, school safety, small class sizes, and availability of high-quality professional development. In particular, teachers rated as effective are more likely to prefer performance-based pay than teachers rated ineffective. We validate our results using administrative data from Tennessee on teachers’ actual mobility patterns.
Concerns about the breadth of the U.S. income distribution and limited intergenerational mobility have led to a focus on educational achievement gaps by socio-economic status (SES). Using intertemporally linked assessments from NAEP, TIMSS, and PISA, we trace the achievement of U.S. student cohorts born between 1954 and 2001. Achievement gaps between the top and bottom deciles and the top and bottom quartiles of the SES distribution have been large and remarkably constant for a near half century. These unwavering gaps have not been offset by overall improvements in achievement levels, which have risen at age 14 but remained unchanged at age 17 for the most recent quarter century. The long-term failure of major educational policies to alter SES gaps suggests a need to reconsider standard approaches to mitigating disparities.
Free and reduced-price meal (FRM) data are used ubiquitously to proxy for student disadvantage in education research and policy applications. The Community Eligibility Provision (CEP)—a recently-implemented policy change to the federally-administered National School Lunch Program—allows schools serving low-income populations to identify all students as FRM-eligible regardless of individual circumstances. We study the CEP’s effect on FRM eligibility as a proxy for student disadvantage, and relatedly, we examine the viability of direct certification (DC) status as an alternative disadvantage measure. Our findings on whether the CEP degrades the informational content of FRM data are mixed. At the individual level there is essentially no effect, but the CEP does meaningfully change the information conveyed by the FRM-eligible share of students in a school. Our comparison of FRM and DC data in the post-CEP era shows that these measures are similarly informative as proxies for disadvantage, despite the CEP-induced information loss in FRM data. Using both measures together can improve the identification of disadvantaged students, but only marginally.
We use a natural experiment to evaluate sample selection correction methods' performance. In 2007, Michigan began requiring that all students take a college entrance exam, increasing the exam-taking rate from 64 to 99%. We apply different selection correction methods, using different sets of predictors, to the pre-policy exam score data. We then compare the corrected data to the complete post-policy exam score data as a benchmark. We find that performance is sensitive to the choice of predictors, but not the choice of selection correction method. Using stronger predictors such as lagged test scores yields more accurate results, but simple parametric methods and less restrictive semiparametric methods yield similar results for any set of predictors. We conclude that gains in this setting from less restrictive econometric methods are small relative to gains from richer data. This suggests that empirical researchers using selection correction methods should focus more on the predictive power of covariates than robustness across modeling choices.