Search EdWorkingPapers

K-12 Education

Displaying 1 - 10 of 717

Teacher expectations and judgments about student capabilities are predictive of student achievement, yet such judgments may be influenced by salient dimensions of student identity and invite biases. Moreover, ambitious math teaching may also invite teacher biases due to the emphasis on student-generated inputs and ideas. In this pre-registered audit experiment, we investigate teacher biases in a) expectations and judgments about student capabilities in math and b) teacher responsiveness to students’ mathematical thinking. Through a between-subjects design, we randomly assigned teachers to a simulated classroom composed of predominantly Black, Latinx/e, or White students and prompted them to respond to a student’s mathematical solution. We also prompted teachers to judge the quality of the student’s mathematical thinking and rate their expectations about the difficulty of the problem for the typical student. Our findings show teachers expected greater task difficulty in both the Latinx/e and Black classroom conditions relative to the White. We also found teachers may be more likely to support student sense-making and provide more positive, substantive affirmations to Black students relative to White students for the same mathematical solution. We did not find differences by condition in other dimensions. Our findings have implications for teacher training and reform-oriented mathematics instruction.

More →


Kathryn Watson.

Iowa's Senate File 496 requires parent permission to formally survey students about their mental health, bans the discussion of gender identity and sexual orientation in schools before 7th grade, mandates schools obtain parental permission to use a nick name, and bans any books that depict or describe sex acts in schools. This exploratory case study explores educators’ (n = 20) perceptions of Senate File 496’s influence on school-based mental health systems and the ways participants perceived the legislation influenced student mental health. Key findings reveal that Senate File 496 was dismantling school-based mental health systems in schools, there was a rise of vigilantism in education, and participants perceived the legislation caused irrevocable student harm.

More →


Generative AI, particularly Language Models (LMs), has the potential to transform real-world domains with societal impact, particularly where access to experts is limited. For example, in education, training novice educators with expert guidance is important for effectiveness but expensive, creating significant barriers to improving education quality at scale. This challenge disproportionately hurts students from under-served communities, who stand to gain the most from high-quality education and are most likely to be taught by inexperienced educators. We introduce Tutor CoPilot, a novel Human-AI approach that leverages a model of expert thinking to provide expert-like guidance to tutors as they tutor. This study presents the first randomized controlled trial of a Human-AI system in live tutoring, involving 900 tutors and 1,800 K-12 students from historically under-served communities. Following a preregistered analysis plan, we find that students working on mathematics with tutors randomly assigned to have access to Tutor CoPilot are 4 percentage points (p.p.) more likely to master topics (p<0.01). Notably, students of lower-rated tutors experienced the greatest benefit, improving mastery by 9 p.p. relative to the control group. We find that Tutor CoPilot costs only $20 per-tutor annually, based on the tutors’ usage during the study. We analyze 550,000+ messages using classifiers to identify pedagogical strategies, and find that tutors with access to Tutor CoPilot are more likely to use strategies that foster student understanding (e.g., asking guiding questions) and less likely to give away the answer to the student, aligning with high-quality teaching practices. Tutor interviews qualitatively highlight how Tutor CoPilot’s guidance helps them to respond to student needs, though tutors flag common issues in Tutor CoPilot, such as generating suggestions that are not grade-level appropriate. Altogether, our study of Tutor CoPilot demonstrates how Human-AI systems can scale expertise in real-world domains, bridge gaps in skills and create a future where high-quality education is accessible to all students.

More →


This study provides a descriptive analysis of police intervention as a response to student behavior in New York City public schools. We find that between the 2016/17 and 2021/22 academic years, arrests and juvenile referrals decreased while non-detainment-based and psychiatric police interventions increased. However, Black students, especially those enrolled in schools located in predominantly White police precincts experiencing a shrinking White student population, experienced disproportionate rates of arrests, juvenile referrals, and police-involved psychiatric interventions. Schools serving more Black students experienced higher rates of interventions relative to schools with fewer Black students, but these higher rates of intervention are not explained by differences in observable student behavior and characteristics. Instead, differences in teacher characteristics and resources contribute to the excess use of police interventions in predominantly Black schools.

More →


The pivotal role of Algebra in the educational trajectories of U.S. students continues to motivate controversial, high-profile policies focused on when students access the course, their classroom peers, and how the course is taught. This random-assignment partnership study examines an innovative district-level reform—the Algebra I Initiative—that placed 9th-grade students with prior math scores well below grade level into Algebra I classes coupled with teacher training instead of a remedial pre-Algebra class. We find that this reform significantly increased grade-11 math achievement (ES = 0.2 SD) without lowering the achievement of classroom peers eligible for conventional Algebra I classes. This initiative also increased attendance, district retention, and overall math credits. These results suggest that higher expectations for the lowest-performing students coupled with aligned teacher supports is a promising model for realizing students’ mathematical potential.

More →


Maya Kaul.

Teachers’ professional identities are the foundation of their practice. Previous scholarship has largely overlooked the extent to which the broader institutional environment shapes teachers’ professional identities. In this study, I bridge institutional logics with theory on teacher professional identity to empirically examine the deeply institutionalized, taken-for-granted ways American society has come to think of teaching (e.g., as a moral calling, as a profession, as labor) are internalized by K-12 teachers. I draw on survey data from 950 teachers across four US states (California, New York, Florida, and Texas), and develop an original survey measure to capture what I term teachers’ “institutionalized conceptions of teaching.” Across diverse state policy contexts, I find that teachers’ conceptions of teaching are guided by three underling logics: (1) an accountability logic, (2) a democratic logic, and (3) a moral calling logic. I then surface a typology of teacher professional and examine the relationship between these logics and teachers’ professional identities. I find that the taken-for-granted ways society frames teaching may be associated with dimensions of teachers’ professional identity, such as self-efficacy and professional commitment. Together, the findings suggest that supporting the professional well-being of K-12 teaching may demand shifting the deeply institutionalized norms of the profession to be more aligned with teachers’ democratic and moral aims—rather than our system's deep norms around external accountability. The study offers methodological contributions to the study of logics, as well as practical implications for the field of teaching.

More →


School board candidates supported by local teachers' unions overwhelmingly win and we examine the causes and consequences of the "teachers' union premium" in these elections. First, we show that union endorsement information increases voter support. Although the magnitude of this effect varies across ideological and partisan subgroups, an endorsement never hurts a candidate's prospects among any major segment of the electorate. Second, we benchmark the size of the endorsement premium to other well-known determinants of vote-choice in local elections. Perhaps surprisingly, we show the endorsement effect can be as large as the impact of shared partisanship, and substantially larger than the boost from endorsements provided by other stakeholders. Finally, examining real-world endorsement decisions, we find that union support for incumbents hinges on self-interested pecuniary considerations and is unaffected by performance in improving student academic outcomes. The divergence between what endorsements mean and how voters interpret them have troubling normative democratic implications.

More →


Racial disparities in violence exposure and criminal justice contact are a subject of growing policy and public concern. We conduct a large-scale, randomized controlled trial of a six-month behavioral health intervention combining intensive mentoring and group therapy designed to reduce criminal justice and violence involvement among Black and Latinx youth in Chicago. Over 24 months, youth offered the program experienced an 18 percent reduction in the probability of any arrest and a 23 percent reduction in the probability of a violent-crime arrest. These statistically significant impacts, with smaller magnitudes, continue to persist up to 3 years post randomization. To better understand the behavior change we observe given an arrest is a proxy for criminal behavior, we create a supervised machine learning algorithm from arrest narratives that determines if an arrest was initiated more or less at the discretion of police. We find that the program’s impacts are concentrated in arrests where officers have less discretion in initiating contact, while having little impact on more discretionary contact arrests (e.g. a young person exhibiting “suspicious” behavior). This analysis suggests the effects of the program are being driven by a reduction in youth offending behavior rather than by avoiding police contact.

More →


Understanding the factors that influence student outcomes is crucial for both parents and schools when designing effective educational strategies. This paper explores the impact of peer age on both cognitive and non-cognitive outcomes using a randomized sample of middle school students. By analyzing how exogenous variations in peer age affect students' academic performance, self-expectations and confidence, health perceptions, behavioral traits, and social development, we highlight the important role that peer age plays in educational contexts. Our findings reveal that an increase in the average age of classmates results in negative effects on both cognitive and non-cognitive outcomes of a student. We also identify significant heterogeneous effects based on student relative age and gender. We delve into potential mechanisms behind these effects and study inputs from the perspective of student themselves, parents, teachers, and the school within the framework of the education production function. The results suggest that students' persistence in their studies, the quality of friendships, and the school environment they are exposed to are the primary drivers of our main findings. These findings underscore the importance of addressing age disparities within classrooms to enhance students' cognitive and non-cognitive development.

More →


U.S. public schools are engaged in an unprecedented effort to expand tutoring in the wake of the COVID-19 pandemic. Broad-based support for scaling tutoring emerged, in part, because of the large effects on student achievement found in prior meta-analyses. We conduct an expanded meta-analysis of 265 randomized control trials and explore how estimates change when we better align our sample with a policy-relevant target of inference: large-scale tutoring programs in the U.S. aiming to improve standardized test performance. Pooled effect sizes from studies with stronger target-equivalence remain meaningful but are only a third to a half as large as those from our full sample. This result is driven by stark declines in pooled effect sizes as program scale increases. We explore four hypotheses for this pattern and document how a bundled package of recommended design features serves to partially inoculate programs from these attenuated effects at scale. 

More →