Search EdWorkingPapers

Methodology, measurement and data

Displaying 31 - 40 of 186

While recent studies have demonstrated the potential of automated feedback to enhance teacher instruction in virtual settings, its efficacy in traditional classrooms remains unexplored. In collaboration with TeachFX, we conducted a pre-registered randomized controlled trial involving 523 Utah mathematics and science teachers to assess the impact of automated feedback in K-12 classrooms. This feedback targeted “focusing questions” – questions that probe students’ thinking by pressing for explanations and reflection. Our findings indicate that automated feedback increased teachers’ use of focusing questions by 20%. However, there was no discernible effect on other teaching practices. Qualitative interviews revealed mixed engagement with the automated feedback: some teachers noticed and appreciated the reflective insights from the feedback, while others had no knowledge of it. Teachers also expressed skepticism about the accuracy of feedback, concerns about data security, and/or noted that time constraints prevented their engagement with the feedback. Our findings highlight avenues for future work, including integrating this feedback into existing professional development activities to maximize its effect.

More →


Noncognitive constructs such as self-e cacy, social awareness, and academic engagement are widely acknowledged as critical components of human capital, but systematic data collection on such skills in school systems is complicated by conceptual ambiguities, measurement challenges and resource constraints. This study addresses this issue by comparing the predictive validity of two most widely used metrics on noncogntive outcomes|observable academic behaviors (e.g., absenteeism, suspensions) and student self-reported social and emotional learning (SEL) skills|for the likelihood of high school graduation and postsecondary attainment. Our  ndings suggest that conditional on student demographics and achievement, academic behaviors are several-fold more predictive than SEL skills for all long-run outcomes, and adding SEL skills to a model with academic behaviors improves the model's predictive power minimally. In addition, academic behaviors are particularly strong predictors for low-achieving students' long-run outcomes. Part-day absenteeism (as a result of class skipping) is the largest driver behind the strong predictive power of academic behaviors. Developing more nuanced behavioral measures in existing administrative data systems might be a fruitful strategy for schools whose intended goal centers on predicting students' educational attainment.

More →


Longitudinal studies can produce biased estimates of learning if children miss tests. In an application to summer learning, we illustrate how missing test scores can create an illusion of large summer learning gaps when true gaps are close to zero. We demonstrate two methods that reduce bias by exploiting the correlations between missing and observed scores on tests taken by the same child at different times. One method, multiple imputation, uses those correlations to fill in missing scores with plausible imputed scores. The other method models the correlations implicitly, using child-level random effects. Widespread adoption of these methods would improve the validity of summer learning studies and other longitudinal research in education.

More →


Teacher preparation programs are increasingly expected to use data on pre-service teacher (PST) skills to drive program improvement and provide targeted supports. Observational ratings are especially vital, but also prone to measurement issues. Scores may be influenced by factors unrelated to PSTs’ instructional skills, including rater standards and mentor teachers’ skills. Yet we know little about how these measurement challenges play out in the PST context. Here we investigate the reliability and sensitivity of two observational measures. We find measures collected during student teaching are especially prone to measurement issues; only 3-4% of variation in scores reflects consistent differences between PSTs, while 9-17% of variation can be attributed to the mentors with whom they work. When high scores stem not from strong instructional skills, but instead from external circumstances, we cannot use them to make consequential decisions about PSTs’ individual needs or readiness for independent teaching.

More →


The recent spike in book challenges has put school libraries at the center of heated political debates. I investigate the relationship between local politics and school library collections using data on books with controversial content in 6,631 public school libraries. Libraries in conservative areas have fewer titles with LGBTQ+, race/racism, or abortion content and more Christian fiction and discontinued Dr. Seuss titles. This is true even though most libraries have at least some controversial content. I also find that state laws that restrict curricular content are negatively related to some kinds of controversial books. Finally, I present descriptive short-term evidence that book challenges in the 2021-22 school year have had “chilling effects” on the acquisition of new LGBTQ+ titles.

More →


Explaining the productivity paradox—the phenomenon where an introduction of information and communication technology (ICT) does not lead to improvements in labor productivity—is difficult, as changes in technology often coincide with adjustments to working hours and substitution of labor. I conduct a cluster-randomized trial in India to investigate the effects of a program that provides teachers with continuous training and materials, encouraging them to blend their instruction with high-quality videos. Teaching hours, teacher-to-student assignments, and the curriculum are held constant. Eleven months after its launch, I document negative effects on student learning in grades 9 and 10 in mathematics, and no effects in science. I also find detrimental effects on instructional quality, instructional practices, and student perceptions and attitudes towards mathematics and science. These findings suggest adjustment costs can serve as one explanation for the paradox.

More →


Generally, need-based financial aid improves students’ academic outcomes. However, the largest source of need-based grant aid in the United States, the Federal Pell Grant Program (Pell), has a mixed evaluation record. We assess the minimum Pell Grant in a regression discontinuity framework, using Kentucky administrative data. We focus on whether and how year-to-year changes in aid eligibility and interactions with other sources of aid attenuate Pell’s estimated effects on post-secondary outcomes. This evaluation complements past work by assessing explanations for the null or muted impacts found in our analysis and other Pell evaluations. We also discuss the limitations of using regression discontinuity methods to evaluate Pell—or other interventions with dynamic eligibility criteria—with respect to generalizability and construct validity.

More →


William Delgado.

Does student-teacher match quality exist? Prior work has documented large disparities in teachers' impacts across student types but has not distinguished between sorting and causal effects as the drivers of these disparities. I propose a disparate value-added model and derive a novel measure of teacher quality---revealed comparative advantage---that captures the degree to which teachers affect student outcome gaps. Quasi-experimental changes in teaching staff show that the comparative advantage measure accurately predicts teachers’ disparate impacts: a teacher with a 1 standard deviation in revealed comparative advantage for black students increases black students' test scores by 1 standard deviation and has no effect on non-black students' test scores. Teacher removal and teacher-to-classroom re-allocation simulations show substantial efficiency and equity gains of considering teachers’ comparative advantage.

More →


We develop a unifying conceptual framework for understanding and predicting teacher shortages at the state, region, district, and school levels. We then generate and test hypotheses about geographic and subject variation in teacher shortages using data on unfilled teaching positions in Tennessee during the fall of 2019. We find that teacher staffing challenges are highly localized, causing shortages and surpluses to coexist. Aggregate descriptions of staffing challenges mask considerable variation between schools and subjects within districts. Schools with fewer local early-career teachers, smaller district salary increases, worse working conditions, and higher historical attrition rates have higher vacancy rates. Our findings illustrate why viewpoints about, and solutions to, shortages depend critically on whether one takes an aggregate or local perspective.

More →


Novice teachers improve substantially in their first years on the job, but we know remarkably little about the nature of this skill development. Using data from Tennessee, we leverage a feature of the classroom observation protocol that asks school administrators to identify an item on which the teacher should focus their improvement efforts. This “area of refinement” overcomes a key measurement challenge endemic to inferring from classroom observation scores the development of specific teaching skills. We show that administrators disproportionately identify two teaching skills when observing novice teachers: classroom management and presenting content. Struggling with classroom management, in particular, is linked to high rates of novice teacher attrition. Among those who remain, we observe subsequent improvement in these skills.

More →