Search EdWorkingPapers

Standards, accountability, assessment, and curriculum

Displaying 1 - 10 of 109

Beth Schueler, Liz Nigro, John Wang.

Limited scholarship examines districtwide turnaround reforms beyond the first few years of implementation or efforts to replicate successes in new contexts. We study Massachusetts, home to a state takeover of the Lawrence school district that led to academic gains in early reform years, and where state leaders attempted to replicate this success in three additional communities. We use statewide student-level administrative data (2006-07 to 2018- 19) and event study methods to estimate medium-term impacts on student outcomes across four districts. We find the initial improvements were largely sustained in Lawrence. We observe evidence of successful replication in Springfield but not Holyoke or Southbridge. The two turnarounds with positive outcomes both struck a unique balance between state and local input into decision-making.

More →


Jennifer Darling-Aduana, Carolyn J. Heinrich, Jeremy Noonan, Jialing Wu, Kathryn Enriquez.

Online credit recovery (OCR) courses are the most common means through which students retake courses required for high school graduation. Yet a growing body of research has raised concerns regarding student learning in these courses, with low quality assessments posited as one contributing factor. To address this concern, we reviewed every assessment item from a widely used OCR Algebra 1 course. We also examined pathways for passing the course mastery tests without learning content. In addition, we identified if and how states regulate OCR. We found OCR assessments as executed lacked rigor and validity. We offer recommendations to improve rigor, close pathways that call into question the validity of results, strengthen implementation procedures, and increase state-level oversight of providers.

More →


Matthew H. Lee, John Thompson, Eric Wearne.

Hybrid school enrollments are trending up and many parents express a diverse range of reasons for enrolling their children in hybrid schools. Yet little is known about the pedagogical goals pursued by hybrid schools. We aim to help close this gap in the literature with a stated preferences experiment of hybrid school leaders’ perceptions of program success. Sixty-three school leaders participated in a survey experiment in which we randomly assigned attributes to hypothetical programs and asked school leaders to identify the most successful program. We find that hybrid school leaders consider a broad range of student outcomes when evaluating program success, including labor market outcomes, civic outcomes, and family life. Students’ religious observance produced the largest effect sizes, a reasonable finding considering that roughly two-thirds of the schools represented in our sample have some religious affiliation. We do not find evidence that test score outcomes and higher education matriculation contribute meaningfully to perceived success.

More →


Steven Michael Carlo.

The National Assessment of Educational Progress (NAEP) has tested the civic, or citizenship knowledge of students across the nation at irregular intervals since its very inception. Despite advancements in reading and mathematics, evidenced by results from the National Assessment of Educational Progress (NAEP), civics proficiency has remained consistently low, which raises concerns among educators and policymakers. This study attempts to provide those educators and policymakers with state-level predictions, not currently provided for the civics assessment. This research addresses this gap in state-level civics education data by applying multilevel regression with poststratification (MRP) to NAEP's nationally representative civics scores, yielding state-specific estimates that account for student demographics. A historical analysis of NAEP's development underscores its significance in national education and highlights the challenges of transitioning to state-level reporting, particularly for civics, which lacks state-level generalizability. Furthermore, this paper evaluates NAEP's frameworks, questioning their alignment with civics education's evolving needs, and investigates the presence of opportunity gaps in civics knowledge across gender and racial/ethnic lines. By comparing MRP estimates with published NAEP results, the study validates the method's credibility and emphasizes the potential of MRP in educational research. The findings reveal persistent racial/ethnic disparities in civic knowledge, with profound implications for civics instruction and policy. The research concludes by stressing the necessity for state-specific data to inform education policy and practice, advocating for teaching methods that enhance civic understanding and engagement, and suggesting future research directions to address the uncovered disparities.

More →


Daniel Rodriguez-Segura, Savannah Tierney.

While learning outcomes in low- and middle-income countries are generally at low levels, the degree to which students and schools more broadly within education systems lag behind grade-level proficiency can vary significantly. A substantial portion of existing literature advocates for aligning curricula closer to the proficiency level of the “median child” within each system. Yet, amidst considerable between-school heterogeneity in learning outcomes, choosing a single instructional level for the entire system may still leave behind those students in schools far from this level. Hence, establishing system-wide curriculum expectations in the presence of significant between-school heterogeneity poses a significant challenge for policymakers — especially as the issue of between-school heterogeneity has been relatively unexplored by researchers so far. This paper addresses the gap by leveraging a unique dataset on foundational literacy and numeracy outcomes, representative of six public educational systems encompassing over 900,000 enrolled children in South Asia and West Africa. With this dataset, we examine the current extent of between-school heterogeneity in learning outcomes, the potential predictors of this heterogeneity, and explore its potential implications for setting national curricula for different grade levels and subjects. Our findings reveal that between-school heterogeneity can indeed present both a severe pedagogical hindrance and challenges for policymakers, particularly in contexts with relatively higher levels of performance and in the higher grades. In response to meaningful between-system heterogeneity, we also demonstrate through simulation that a more nuanced, data-driven targeting of curricular expectations for different schools within a system could empower policymakers to effectively reach a broader spectrum of students through classroom instruction.

More →


Umut Özek.

Public policies targeting individuals based on need often impose disproportionate burden on communities that lack the resources to implement these policies effectively. In an elementary school setting, I examine whether community-level interventions focusing on similar needs and providing resources to build capacity in these communities could improve outcomes by improving the effectiveness of individual-level interventions. I find that the extended school day policy that targets lowest-performing schools in reading in Florida significantly improved the effectiveness of the third-grade retention policy in these schools. These complementarities were large enough to close the gap in retention effects between targeted and higher-performing schools.

More →


Sarah Ruth Morris, Andy Parra-Martinez, Jonathan Wai, Robert Maranto.

This mixed-methods study synthesizes Standards-Based Grading (SBG) literature, analyzes 249 Arkansas administrators' survey responses using OLS regressions, and identifies themes through in-vivo coding of qualitative feedback. Results show more SBG support among liberal, elementary-level administrators in larger, economically diverse districts. Qualitative insights highlight structural barriers and mindsets against SBG, emphasizing its importance for mastery-focused assessment and grading alignment. These findings underscore the influence of principals' beliefs on SBG support and suggest researching the contextual and ideological factors influencing SBG's implementation.

More →


Joshua Bleiberg, Eric Brunner, Erica Harbatkin, Matthew A. Kraft, Matthew G. Springer.

Federal incentives and requirements under the Obama administration spurred states to adopt major reforms to their teacher evaluation systems. We examine the effects of these reforms on student achievement and attainment at a national scale by exploiting their staggered implementation across states. We find precisely estimated null effects, on average, that rule out impacts as small as 0.017 standard deviations for achievement and 1.2 percentage points for high school graduation and college enrollment. We highlight five factors that likely limited the efficacy of teacher evaluation at scale: political opposition, decentralization, capacity constraints, limited generalizability, and the absence of compensating wages.

More →


Blake Heller.

In 2016, the GED® introduced college readiness benchmarks designed to identify testers who are academically prepared for credit-bearing college coursework. The benchmarks are promoted as awarding college credits or exempting “college-ready” GED® graduates from remedial coursework. I show descriptive evidence that those identified as college-ready by these benchmarks enroll and persist in college at significantly higher rates than others who pass the GED® exam, but at lower rates than recent graduates with traditional high school diplomas. Regression discontinuity estimates show that crossing a college readiness threshold does not substantially influence testers' college enrollment or persistence during the two years following their first test attempt. Relatedly, I observe little exam retaking by those who fall narrowly short of the minimum college readiness score thresholds. This contrasts strongly with retaking behavior near the lower GED® passing threshold that determines eligibility for a high school equivalency credential. Those who narrowly fail a GED® subject test are over 100 times more likely to retest than those who fall just short of a college readiness benchmark in the same subject. GED® college readiness benchmarks do not currently appear to promote better college outcomes, but in the absence of more detailed test score information they offer a simple heuristic to predict short-run college enrollment and persistence among GED® graduates, particularly for those who identify educational gain as a primary reason for testing. The results highlight the promise and challenges associated with building pathways for non-traditional students to earn credit for prior learning.

More →


Heather McCambly, Quinn Mulroy, Andrew Stein.

A common point of contention across education policy debates is whether and how facially race-neutral metrics of quality produce or maintain racialized inequities. Medical education is a useful site for interrogating this relationship, as many scholars point to the 1910, Carnegie-funded Flexner Report—which proposed standardized quality metrics—as a main driver of the closure of five of the seven Black medical schools. Our research demonstrates how these proposed quality metrics, and their philanthropic and political advocates, instantiated a racialized organizational order that governed the distribution of resources, the development of state certification processes, and the regulation of medical schools. This analysis provides traction for uncovering how taken-for-granted standards of quality come to maintain racialized access to opportunity in education.

More →