Search EdWorkingPapers

Dorottya Demszky

Inequality in college has both structural and psychological causes; these include the presence of self-defeating beliefs about the potential for growth and belonging. Such beliefs can be addressed through large-scale interventions in the college transition (Walton & Cohen, 2011; Walton et al., 2023) but are hard to measure. In our pre-registered study, we provide the strongest evidence to date that the belief that belonging challenges are common and tend to improve with time (“a process-oriented perspective”), the primary target of social-belonging interventions, is critical. We did so by developing and applying computational language measures to 25,000 essays written during a randomized trial of this intervention across 22 broadly representative US colleges and universities (Walton et al., 2023). We compare the hypothesized mediator to one of simple optimism, which includes positive expectations without recognizing that challenges are common. Examining the active control condition, we find that socially disadvantaged students are, indeed, significantly less likely to express a process-oriented perspective spontaneously, and more likely to express simple optimism. This matters: Students who convey a process-oriented perspective, both in control and treatment conditions, are significantly more likely to complete their first year of college full-time enrolled and have higher first-year GPAs, while simple optimism predicts worse academic progress. The social-belonging intervention helped distribute a process-oriented perspective more equitably, though disparities remained. These computational methods enable the scalable and unobtrusive assessment of subtle student beliefs that help or hinder college success.

More →


This study provides the first large-scale quantitative exploration of mathematical language use in upper elementary U.S. classrooms. Our approach employs natural language processing techniques to describe variation in teachers’ and students’ use of mathematical language in 1,657 fourth and fifth grade lessons in 317 classrooms in four districts over three years. Students’ exposure to mathematical language varies substantially across lessons and between teachers. Results suggest that teacher modeling, defined as the density of mathematical terms in teacher talk, does not substantially cause students to uptake mathematical language, but that teachers may encourage student use of mathematical vocabulary by means other than mere modeling or exposure. However, we also find that teachers who use more mathematical language are more effective at raising student test scores. These findings reveal that teachers who use more mathematical vocabulary are more effective math teachers.

More →


Despite well-designed curriculum materials, teachers often face challenges in their implementation due to diverse classroom needs. This paper investigates whether Large Language Models (LLMs) can support middle-school math teachers by helping create high-quality curriculum scaffolds, which we define as the adaptations and supplements teachers employ to ensure all students can access and engage with the curriculum. Through Cognitive Task Analysis with expert teachers, we identify a three-stage process for curriculum scaffolding: observation, strategy formulation, and implementation. We incorporate these insights into three LLM approaches to create warmup tasks that activate background knowledge. The best-performing approach, which provides the model with the original curriculum materials and an expert-informed prompt, generates warmups that are rated significantly higher than warmups created by expert teachers in terms of alignment to learning objectives, accessibility to students working below grade level, and teacher preference. This research demonstrates the potential of LLMs to support teachers in creating effective scaffolds and provides a methodology for developing AI-driven educational tools.

More →


Providing ample opportunities for students to express their thinking is pivotal to their learning of mathematical concepts. We introduce the Talk Meter, which provides in-the-moment automated feedback on student-teacher talk ratios. We conduct a randomized controlled trial on a virtual math tutoring platform (n=742 tutors) to evaluate the effectiveness of the Talk Meter at increasing student talk. In one treatment arm, we show the Talk Meter only to the tutor, while in the other arm we show it to both the student and the tutor. We find that the Talk Meter increases student talk ratios in both treatment conditions by 13-14%; this trend is driven by the tutor talking less in the tutor-facing condition, whereas in the student-facing condition it is driven by the student expressing significantly more mathematical thinking. Through interviews with tutors, we find the student-facing Talk Meter was more motivating to students, especially those with introverted personalities, and was effective at encouraging joint effort towards balanced talk time. These results demonstrate the promise of in-the-moment joint talk time feedback to both teachers and students as a low cost, engaging, and scalable way to increase students' mathematical reasoning.

More →


While recent studies have demonstrated the potential of automated feedback to enhance teacher instruction in virtual settings, its efficacy in traditional classrooms remains unexplored. In collaboration with TeachFX, we conducted a pre-registered randomized controlled trial involving 523 Utah mathematics and science teachers to assess the impact of automated feedback in K-12 classrooms. This feedback targeted “focusing questions” – questions that probe students’ thinking by pressing for explanations and reflection. Our findings indicate that automated feedback increased teachers’ use of focusing questions by 20%. However, there was no discernible effect on other teaching practices. Qualitative interviews revealed mixed engagement with the automated feedback: some teachers noticed and appreciated the reflective insights from the feedback, while others had no knowledge of it. Teachers also expressed skepticism about the accuracy of feedback, concerns about data security, and/or noted that time constraints prevented their engagement with the feedback. Our findings highlight avenues for future work, including integrating this feedback into existing professional development activities to maximize its effect.

More →


Teachers’ attitudes and classroom management practices critically affect students’ academic and behavioral outcomes, contributing to the persistent issue of racial disparities in school discipline. Yet, identifying and improving classroom management at scale is challenging, as existing methods require expensive classroom observations by experts. We apply natural language processing methods to elementary math classroom transcripts to computationally measure the frequency of teachers’ classroom management language in instructional dialogue and the degree to which such language is reflective of punitive attitudes. We find that the frequency and punitiveness of classroom management language show strong and systematic correlations with human-rated observational measures of instructional quality, student and teacher perceptions of classroom climate, and student academic outcomes. Our analyses reveal racial disparities and patterns of escalation in classroom management language. We find that classrooms with higher proportions of Black students experience more frequent and more punitive classroom management. The frequency and punitiveness of classroom management language escalate over time during observations, and these escalations occur more severely for classrooms with higher proportions of Black students. Our results demonstrate the potential of automated measures and position everyday classroom management interactions as a critical site of intervention for addressing racial disparities, preventing escalation, and reducing punitive attitudes.

More →


Providing consistent, individualized feedback to teachers is essential for improving instruction but can be prohibitively resource-intensive in most educational contexts. We develop M-Powering Teachers, an automated tool based on natural language processing to give teachers feedback on their uptake of student contributions, a high-leverage dialogic teaching practice that makes students feel heard. We conduct a randomized controlled trial in an online computer science course (n=1,136 instructors), to evaluate the effectiveness of our tool. We find that M-Powering Teachers improves instructors’ uptake of student contributions by 13% and present suggestive evidence that it also improves students’ satisfaction with the course and assignment completion. These results demonstrate the promise of M-Powering Teachers to complement existing efforts in teachers’ professional development.

More →


Although learners are being connected 1:1 with instructors at an increasing scale, most of these instructors do not receive effective, consistent feedback to help them improved. We deployed M-Powering Teachers, an automated tool based on natural language processing to give instructors feedback on dialogic instructional practices —including their uptake of student contributions, talk time and questioning practices — in a 1:1 online learning context. We conducted a randomized controlled trial on Polygence, a re-search mentorship platform for high schoolers (n=414 mentors) to evaluate the effectiveness of the feedback tool. We find that the intervention improved mentors’ uptake of student contributions by 10%, reduced their talk time by 5% and improves student’s experi-ence with the program as well as their relative optimism about their academic future. These results corroborate existing evidence that scalable and low-cost automated feedback can improve instruction and learning in online educational contexts.

More →


Classroom discourse is a core medium of instruction --- analyzing it can provide a window into teaching and learning as well as driving the development of new tools for improving instruction. We introduce the largest dataset of mathematics classroom transcripts available to researchers, and demonstrate how this data can help improve instruction. The dataset consists of 1,660 45-60 minute long 4th and 5th grade elementary mathematics observations collected by the National Center for Teacher Effectiveness (NCTE) between 2010-2013. The anonymized transcripts represent data from 317 teachers across 4 school districts that serve largely historically marginalized students. The transcripts come with rich metadata, including turn-level annotations for dialogic discourse moves, classroom observation scores, demographic information, survey responses and student test scores. We demonstrate that our natural language processing model, trained on our turn-level annotations, can learn to identify dialogic discourse moves and these moves are correlated with better classroom observation scores and learning outcomes. This dataset opens up several possibilities for researchers, educators and policymakers to learn about and improve K-12 instruction.

The data and its terms of use can be accessed here: https://github.com/ddemszky/classroom-transcript-analysis

More →


Responsive teaching is a highly effective strategy that promotes student learning. In math classrooms, teachers might funnel students towards a normative answer or focus students to reflect on their own thinking, deepening their understanding of math concepts. When teachers focus, they treat students’ contributions as resources for collective sensemaking, and thereby significantly improve students’ achievement and confidence in mathematics. We propose the task of computationally detecting funneling and focusing questions in classroom discourse. We do so by creating and releasing an annotated dataset of 2,348 teacher utterances labeled for funneling and focusing questions, or neither. We introduce supervised and unsupervised approaches to differentiating these questions. Our best model, a supervised RoBERTa model fine-tuned on our dataset, has a strong linear correlation of .76 with human expert labels and with positive educational outcomes, including math instruction quality and student achievement, showing the model’s potential for use in automated teacher feedback tools. Our unsupervised measures show significant but weaker correlations with human labels and outcomes, and they highlight interesting linguistic patterns of funneling and focusing questions. The high performance of the supervised measure indicates its promise for supporting teachers in their instruction.

More →