The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education

April 2024

Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers’ expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that focuses on low-inference instructional practices, this paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instructional practices in two distinct educational settings: in-person K-12 classrooms and simulated performance tasks for pre-service teachers. This is also the first study that applies NLP to measure a teaching practice that has been demonstrated to be particularly effective for students with special needs. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings. Our results suggest that pretrained Language Models (PLMs) demonstrate performances comparable to the agreement level of human raters for variables that are more discrete and require lower inference, but their efficacy diminishes with more complex teaching practices. Interestingly, using only teachers’ utterances as input yields strong results for student-centered variables, alleviating common concerns over the difficulty of collecting and transcribing high-quality student speech data in in-person teaching settings. Our findings highlight both the potential and the limitations of current NLP techniques in the education domain, opening avenues for further exploration.

Education level

K-12 Education

Topics

Methods

Related Resources from the EdExchange

Learn More about the Annenberg EdExchange

A library of instruments for education research and school improvement

Mathematics Scan

Category: Teacher and Leader Development

Tags: Culturally responsive schooling, Instructional practices, Mathematics education

REAL Math Toolkit Observation and Debrief Protocol

Category: Teacher and Leader Development

Tags: Culturally responsive schooling, Instructional practices, Mathematics education

CIRP Freshman Survey (TFS)

Category: Pathways to and Through Postsecondary

Tags: Higher education, Instructional practices

See All Related Instruments

A new model for informed decision-making in education

Evidence-Based Practices for Teaching Writing in Middle and High School

Category: Student Learning

Tags: High schools, Instructional practices

Design Principles for Accelerating Student Learning with High-Impact Tutoring (updated June 2024)

Category: Student Learning

Tags: Instructional practices, Student supports, Tutoring

Accelerating Student Academic Recovery

Category: Student Learning

Tags: Covid-19 recovery, Instructional practices, Student supports

See All Related Briefs

Search and Filter

The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education

Related EdWorkingPapers

An Applied Researchers’ Guide to Estimating Effects from Blocked Cluster Randomized Trials: Estimands, Estimators, and Estimates

Conditional Hypothesis Generation for LLM-Based Text Analysis with Researcher-Specified Covariates

Related Resources from the EdExchange

Mathematics Scan

REAL Math Toolkit Observation and Debrief Protocol

CIRP Freshman Survey (TFS)

Evidence-Based Practices for Teaching Writing in Middle and High School

​​Design Principles for Accelerating Student Learning with High-Impact Tutoring (updated June 2024)

Accelerating Student Academic Recovery

Design Principles for Accelerating Student Learning with High-Impact Tutoring (updated June 2024)