James Soland.

Research has begun to investigate whether teachers and schools are as effective with certain student subgroups as they are with the overall student population.  Most of this research has examined the issue by trying to produce causal estimates of school contributions to short-term student growth (usually using value-added models) and has emphasized rank orderings of schools by subgroup.  However, not much is known about whether schools contributing to long-term growth for all students are also contributing to student growth by subgroup in ways that might close achievement gaps.  In this study, schools’ contributions to student growth are estimated separately for Black versus White students.  Results show that focusing on rank orderings of schools alone can mask troubling trends in relative achievement over time.  Options for how policymakers can sensibly hold schools accountable for student growth, including under The Every Student Succeeds Act, are discussed.

James Soland, Yeow Meng Thum.

Effect sizes in the Cohen’s d family are often used in education to compare estimates across studies, measures, and sample sizes.  For example, effect sizes are used to compare gains in achievement students make over time, either in pre- and post-treatment studies or in the absence of intervention, such as when estimating achievement gaps.  However, despite extensive research dating back to the paired t-test literature showing that such growth effect sizes should account for within-person correlations of scores over time, such achievement gains are often standardized relative to the standard deviation from a single timepoint or two timepoints pooled.  Such a tendency likely occurs in part because there are not many large datasets from which a distribution of student- or school-level gains can be derived.  In this study, we present a novel model for estimating student growth in conjunction with a national dataset to show that effect size estimates for student and school growth are often quite different when standardized relative to a distribution of gains rather than static achievement.  In particular, we provide nationally representative empirical benchmarks for student achievement and gains, including for male-female gaps in those gains, and examine the sensitivity of those effect sizes to how they are standardized.  Our results suggest that effect sizes scaled relative to a distribution of gains are less likely to understate the effects of interventions over time, and that resultant effect sizes often more closely match the estimand of interest for most practice, policy, and evaluation questions.

