Search EdWorkingPapers

Search EdWorkingPapers by author, title, or keywords.

Educator preparation, professional development, performance and evaluation

Paiheng Xu, Jing Liu, Nathan Jones, Julie Cohen, Wei Ai.

Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers’ expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that focuses on low-inference instructional practices, this paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instructional practices in two distinct educational settings: in-person K-12 classrooms and simulated performance tasks for pre-service teachers. This is also the first study that applies NLP to measure a teaching practice that has been demonstrated to be particularly effective for students with special needs. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings. Our results suggest that pretrained Language Models (PLMs) demonstrate performances comparable to the agreement level of human raters for variables that are more discrete and require lower inference, but their efficacy diminishes with more complex teaching practices. Interestingly, using only teachers’ utterances as input yields strong results for student-centered variables, alleviating common concerns over the difficulty of collecting and transcribing high-quality student speech data in in-person teaching settings. Our findings highlight both the potential and the limitations of current NLP techniques in the education domain, opening avenues for further exploration.

More →


Matthew A. Kraft, Melissa Arnold Lyon.

We examine the state of the U.S. K-12 teaching profession over the last half century by compiling nationally representative time-series data on four interrelated constructs: occupational prestige, interest among students, the number of individuals preparing for entry, and on-the-job satisfaction. We find a consistent and dynamic pattern across every measure: a rapid decline in the 1970s, a swift rise in the 1980s extending into the mid 1990s, relative stability, and then a sustained decline beginning around 2010. The current state of the teaching profession is at or near its lowest levels in 50 years. We identify and explore a range of hypotheses that might explain these historical patterns including economic and sociopolitical factors, education policies, and school environments.

More →


Arielle Boguslav.

Despite the common title of “coach,” definitions of high-quality coaching vary tremendously across models and programs. Yet, few studies make comparisons across different models to understand what is most helpful, for whom, and under what circumstances. As a result, practitioners are left with many options and little evidence-based direction. This is exacerbated by the literature’s focus on more abstract features of coaching practice (e.g. building trust), leaving practitioners to figure out what concrete discourse strategies support these goals. This paper begins to address these challenges by introducing a taxonomy of coaching “moves,” parsing the concrete details of coach discourse. While the taxonomy is informed by the literature, it highlights conceptual possibilities rather than providing a list of empirically-grounded or “evidence-based” strategies. In doing so, this taxonomy may serve as a common language to guide future work exploring how coach discourse shapes teacher development, synthesizing across studies, and supporting coach practice.

More →


Olivia L. Chi, Andrew Bacher-Hicks, Ariel Tichnor-Wagner, Sidrah Baloch.

Much recent debate among policymakers and policy advocates focuses on whether states should reduce teacher licensure requirements to ease the burdens of recruiting high quality teachers to the workforce. We examine the effectiveness of individuals who entered the teacher workforce in Massachusetts during the pandemic by obtaining an emergency license, which requires only a bachelor’s degree. Our results show that, in 2021-22, newly hired emergency licensed teachers: 1) were largely rated as proficient (82%) in their performance evaluation ratings and 2) had similar measures of student test score growth as their traditionally licensed peers. However, we find suggestive evidence that emergency licensed teachers with no prior employment in Massachusetts public schools and no prior engagement with the teacher pipeline (i.e., enrollment in teacher preparation, attempting licensure exams) received lower performance ratings and had lower measures of student test score growth in English Language Arts. Taken together, these results encourage the creation of additional flexibility in licensure requirements for those who have demonstrated prior efforts to join the educator pipeline.

More →


Seth B. Hunter, Katherine M. Bowser.

We extend teacher evaluation research by estimating a reformed evaluation system's plausibly causal average effects on rural student achievement, identifying the settings where evaluation works, and incorporating evaluation expenditures. That the literature omits these contributions is concerning as research implies it hinders evidence-based teacher evaluation policymaking for rural districts, which outnumber urban districts. We apply a difference-in-differences framework to Missouri administrative data. Missouri districts could design and maintain reformed systems or outsource these tasks for a small fee to organizations like the Network for Educator Effectiveness (NEE), an evaluation system created for rural users. NEE does not affect student achievement on average but it improves math, and possibly reading, achievement in rural schools where the average student's prior-year achievement score is below the state average or the average teacher's years of experience are below the state average.

More →


David Blazar.

Black teachers are critical resources for our children and schools. Pairing experimental data with rich measures of teacher mindsets and practices and varied student outcomes, I document that: (1) Black teachers in upper-elementary grades have large effects on the self-efficacy and classroom engagement of their Black students (0.7 and 0.8 SD) but not for non-Black students, potentially driven by role modeling; (2) race-matching effects on Black students’ social-emotional learning explain a moderate to large share of effects on more distal outcomes, including absences and test scores; (3) Black teachers also benefit the test scores (0.2 SD) and absences (roughly 20% decrease) of all students—no matter their race/ethnicity—that often persist many years later into high school; and (4) in addition to potential role-modeling channels, Black teachers bring unique mindsets and practices to their work (e.g., preparation for and differentiated instruction, growth mindset beliefs, well-organized classrooms) that mediate a moderate to large share of their effects on student outcomes. These findings help bridge the quantitative “teacher like me” literature with theoretical discussion and qualitative exploration on why Black teachers matter.

More →


Andrew Kwok, Tuan D. Nguyen.

This concurrent mixed methods study descriptively explores teacher residency programs (TRPs) across the nation. We examine program and participant survey data from the National Center for Teacher Residencies (NCTR) to identify important TRP structures for resident support. Latent class analysis of program-level data reveals three types of TRPs (locally-funded low tuition, multi-funded multifaceted, and federally-funded post-residency support), while regression models indicate significant relationships between individual program structures and participant (residents, graduates, mentors, and principals) perceptions. Qualitative analyses of multiple open response items across participants details four salient TRP structures: providing extended clinical experience, localizing individual support, offering programmatic training, and teaching practical professional knowledge. Findings inform policymakers on TRP investment, practitioners about program design, and researchers for continued large-scale evidence.

More →


Kathryn E. Gonzalez, Olivia Healy, Luke Miratrix, Terri J. Sabol.

Despite considerable evidence on the links between average classroom quality and children’s learning, the importance of variation in quality is not well understood. We examined whether three measures of variation in observed classroom quality over the school year – overall variation in quality, teacher-specific trends in quality, and instability in quality – were associated with children’s language, literacy, and regulatory outcomes. We also examined whether variation in quality was associated with teachers’ participation in coaching. Overall variation and instability in emotional support and classroom organization over the year were negatively associated with children’s regulatory and literacy outcomes. Participation in coaching was linked to increased variation only in instructional support. We discuss implications for policies focused on improving classroom quality.

More →


Danielle Sanderson Edwards, Matthew A. Kraft.

“Grow Your Own” (GYO) programs have recently emerged as a promising approach to expand teacher supply, address localized teacher shortages, and diversify the profession. However, little is known about the scale and design of GYO programs, which recruit and support individuals from the local community to become teachers. We conduct a quantitative content analysis to describe 94 GYO initiatives. We find that GYO is used broadly as an umbrella term to describe teacher pipeline programs with very different purposes, participants, and program features. Our results suggest that misalignment between some GYOs’ purposes and program features may inhibit their effectiveness. Finally, we propose a new typology to facilitate more precise discussions of GYO programs.

More →


Sarah Novicoff, Thomas S. Dee.

While policymakers have demonstrated considerable enthusiasm for “science of reading” initiatives, the evidence on the impact of related reforms when implemented at scale is limited. In this pre-registered, quasi-experimental study, we examine California’s recent initiative to improve early literacy across the state’s lowest-performing elementary schools. The Early Literacy Support Block Grant (ELSBG) provided teacher professional development grounded in the science of reading as well as aligned supports (e.g., assessments and interventions), new funding (about $1000 per student), spending flexibility within specified guidelines, and expert facilitation and oversight of school-based planning. We find that ELSBG generated significant (and cost-effective) improvements in ELA achievement in its first two years of implementation (0.14 SD) as well as smaller, spillover improvements in math achievement.

More →