Search EdWorkingPapers

Search EdWorkingPapers by author, title, or keywords.

Standards, accountability, assessment, and curriculum

Lauren Sartain, Silvana Freire, John Q. Easton, Briana Diaz.

Across an array of educational outcomes, evidence suggests that girls outperform boys on average. For example, in Chicago, ninth-grade girls earn math GPAs that are 0.29 points higher than boys on average. This paper examines explanations for this gap, such as girl-boy differences in academic preparation, behaviors and habits, and experiences in math classes. After accounting for these factors, the gender gap in math grades persists. We, then, examine the classroom-level conditions that reduce the gender gap in grades. The gap is smaller in more advanced courses like honors classes and geometry. Further, boys perform more similarly to girls in classes with male teachers. These findings highlight classroom conditions that are more conducive to the academic success of boys.

More →


NaYoung Hwang, Cory Koedel.

We evaluate the effects of grade retention on students’ academic, attendance, and disciplinary outcomes in Indiana. Using a regression discontinuity design, we show that third grade retention increases achievement in English Language Arts (ELA) and math immediately and substantially, and the effects persist into middle school. We find no evidence of grade retention effects on student attendance or disciplinary incidents, again into middle school. Our findings combine to show that Indiana’s third grade retention policy improves achievement for retained students without adverse impacts along (measured) non-academic dimensions.

More →


Mark Murphy, Angela Johnson.
This study examines the effects of English Learner (EL) status on subsequent Special Education (SPED) placement. Through a research-practice partnership, we link student demographic data and initial English proficiency assessment data across seven cohorts of test takers and observe EL and SPED programmatic participation for these students over seven years. Our regression discontinuity (RD) estimates at the English proficiency margin consistently differ substantively from positive associations generated through regression analyses. RD evidence indicates that EL status had no effect on SPED placement at the English proficiency threshold. Grade-by-grade and subgroup RD analyses at this margin suggest that ELs were modestly under-identified for SPED during grade 5 and that ELs whose primary language was Spanish were under-identified for SPED.

More →


Aaron Phipps, Alexander Amaya.

Given the simultaneous rise in time-to-graduation and college GPA, it may be that students reduce their course load to improve their performance. Yet, evidence to date only shows increased course loads increase GPA. We provide a mathematical model showing many unobservable factors -- beyond student ability -- can generate a positive relationship between course load and GPA unless researchers control student schedules. West Point regularly implements the ideal experiment by randomly modifying student schedules with additional training courses. Using 19 years of administrative data, we provide the first causal evidence that taking more courses reduces GPA and increases course failure rates, sometimes substantially.

More →


Jonathan E. Collins.

The George Floyd Protests of the Summer of 2020 initiated public conversations around the need for antiracist teaching. Yet, over time the discussion evolved into policy debates around the use of Critical Race Theory in civics courses. The rapid transition masked the fact that we know little about Americans' policy preferences. Do Americans support antiracist teaching? What factors best explain support/opposition? How does critical race theory factor in? Using a series of original survey experiments, this study shows that Americans maintain strong support for antiracist teaching, but that support is drastically weakened when curriculum features the term "critical race theory."

More →


Ann Mantil, John Papay, Preeya Pandya Mbekeani, Richard J. Murnane.

Preparing K-12 students for careers in science, technology, engineering and mathematics (STEM) fields is an ongoing challenge confronting state policymakers. We examine the implementation of a science graduation testing requirement for high-school students in Massachusetts, beginning with the graduating class of 2010. We find that the design of the new requirement was quite complicated, reflecting the state’s previous experiences with test-based accountability, a broad consensus on policy goals among key stakeholders, and the desire to afford flexibility to local schools and districts. The consequences for both students and schools, while largely consistent with the goals of increasing students’ skills and interest in STEM fields, were in many cases unexpected. We find large differences by demographic subgroup in the probabilities of passing the first science exam and of succeeding on retest, even when conditioning on previous test-score performance. Our results also show impacts of science exit-exam performance for students scoring near the passing threshold, particularly on the high-school graduation rates of females and on college outcomes for higher-income students. These findings demonstrate the importance of equity considerations in designing and evaluating ambitious new policy initiatives.

More →


Ishtiaque Fazlul, Cory Koedel, Eric Parsons.

Measures of student disadvantage—or risk—are critical components of equity-focused education policies. However, the risk measures used in contemporary policies have significant limitations, and despite continued advances in data infrastructure and analytic capacity, there has been little innovation in these measures for decades. We develop a new measure of student risk for use in education policies, which we call Predicted Academic Performance (PAP). PAP is a flexible, data-rich indicator that identifies students at risk of poor academic outcomes. It blends concepts from emerging early warning systems with principles of incentive design to balance the competing priorities of accurate risk measurement and suitability for policy use. In proof-of-concept policy simulations using data from Missouri, we show PAP is more effective than common alternatives at identifying students who are at risk of poor academic outcomes and can be used to target resources toward these students—and students who belong to several other associated risk categories—more efficiently.

More →


Kate Antonovics, Sandra E. Black, Julie Berry Cullen, Akiva Yonah Meiselman.

Schools often track students to classes based on ability. Proponents of tracking argue it is a low-cost tool to improve learning since instruction is more effective when students are more homogeneous, while opponents argue it exacerbates initial differences in opportunities without strong evidence of efficacy. In fact, little is known about the pervasiveness or determinants of ability tracking in the US. To fill this gap, we use detailed administrative data from Texas to estimate the extent of tracking within schools for grades 4 through 8 over the years 2011-2019. We find substantial tracking; tracking within schools overwhelms any sorting by ability that takes place across schools. The most important determinant of tracking is heterogeneity in student ability, and schools operationalize tracking through the classification of students into categories such as gifted and disabled and curricular differentiation. When we examine how tracking changes in response to educational policies, we see that schools decrease tracking in response to accountability pressures. Finally, when we explore how exposure to tracking correlates with student mobility in the achievement distribution, we find positive effects on high-achieving students with no negative effects on low-achieving students, suggesting that tracking may increase inequality by raising the ceiling.

More →


John Papay, Ann Mantil, Richard J. Murnane.

Many states use high-school exit examinations to assess students’ career and college readiness in core subjects. We find meaningful consequences of barely passing the mathematics examination in Massachusetts, as opposed to just failing it. However, these impacts operate at different educational attainment margins for low-income and higher-income students. As in previous work, we find that barely passing increases the probability of graduating from high school for low-income (particularly urban low-income) students, but not for higher-income students. However, this pattern is reversed for 4-year college graduation. For higher-income students only, just passing the examination increases the probability of completing a 4-year college degree by 2.1 percentage points, a sizable effect given that only 13% of these students near the cutoff graduate.

More →


Walter Herring.

Because high-stakes testing for school accountability does not begin until third grade, accountability ratings for elementary schools do not directly measure students’ academic progress in grades K through 2. While it is possible that children’s test scores in grades 3 and above are highly correlated with children’s outcomes in the untested grades, research provides reasons to believe that this might not be the case in all schools. This study explores whether measures of school quality based on test scores in grades 3 through 5 serve as a strong proxy for children’s academic outcomes in grades K through 2. The results show that directly accounting for children’s test scores in the early grades could lead to meaningful changes in schools’ test-based performance ratings. The findings have important implications for accountability policy.

More →