Search EdWorkingPapers

Jing Liu

This study provides the first large-scale quantitative exploration of mathematical language use in upper elementary U.S. classrooms. Our approach employs natural language processing techniques to describe variation in teachers’ and students’ use of mathematical language in 1,657 fourth and fifth grade lessons in 317 classrooms in four districts over three years. Students’ exposure to mathematical language varies substantially across lessons and between teachers. Results suggest that teacher modeling, defined as the density of mathematical terms in teacher talk, does not substantially cause students to uptake mathematical language, but that teachers may encourage student use of mathematical vocabulary by means other than mere modeling or exposure. However, we also find that teachers who use more mathematical language are more effective at raising student test scores. These findings reveal that teachers who use more mathematical vocabulary are more effective math teachers.

More →


Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers’ expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that focuses on low-inference instructional practices, this paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instructional practices in two distinct educational settings: in-person K-12 classrooms and simulated performance tasks for pre-service teachers. This is also the first study that applies NLP to measure a teaching practice that has been demonstrated to be particularly effective for students with special needs. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings. Our results suggest that pretrained Language Models (PLMs) demonstrate performances comparable to the agreement level of human raters for variables that are more discrete and require lower inference, but their efficacy diminishes with more complex teaching practices. Interestingly, using only teachers’ utterances as input yields strong results for student-centered variables, alleviating common concerns over the difficulty of collecting and transcribing high-quality student speech data in in-person teaching settings. Our findings highlight both the potential and the limitations of current NLP techniques in the education domain, opening avenues for further exploration.

More →


This study provides the first causal analysis of the impact of expanding Computer Science (CS) education in U.S. K-12 schools on students’ choice of college major and early career outcomes. Utilizing rich longitudinal data from Maryland, we exploit variation from the staggered rollout of CS course offerings across high schools. Our findings suggest that taking a CS course increases students’ likelihood of declaring a CS major by 10 percentage points and receiving a CS BA degree by 5 percentage points. Additionally, access to CS coursework raises students’ likelihood of being employed and early career earnings. Notably, students who are female, low socioeconomic status, or Black experience larger benefits in terms of CS degree attainment and earnings. However, the lower take-up rates of these groups in CS courses highlight a pressing need for targeted efforts to enhance their participation as policymakers continue to expand CS curricula in K-12 education.

More →


While recent studies have demonstrated the potential of automated feedback to enhance teacher instruction in virtual settings, its efficacy in traditional classrooms remains unexplored. In collaboration with TeachFX, we conducted a pre-registered randomized controlled trial involving 523 Utah mathematics and science teachers to assess the impact of automated feedback in K-12 classrooms. This feedback targeted “focusing questions” – questions that probe students’ thinking by pressing for explanations and reflection. Our findings indicate that automated feedback increased teachers’ use of focusing questions by 20%. However, there was no discernible effect on other teaching practices. Qualitative interviews revealed mixed engagement with the automated feedback: some teachers noticed and appreciated the reflective insights from the feedback, while others had no knowledge of it. Teachers also expressed skepticism about the accuracy of feedback, concerns about data security, and/or noted that time constraints prevented their engagement with the feedback. Our findings highlight avenues for future work, including integrating this feedback into existing professional development activities to maximize its effect.

More →


Noncognitive constructs such as self-e cacy, social awareness, and academic engagement are widely acknowledged as critical components of human capital, but systematic data collection on such skills in school systems is complicated by conceptual ambiguities, measurement challenges and resource constraints. This study addresses this issue by comparing the predictive validity of two most widely used metrics on noncogntive outcomes|observable academic behaviors (e.g., absenteeism, suspensions) and student self-reported social and emotional learning (SEL) skills|for the likelihood of high school graduation and postsecondary attainment. Our  ndings suggest that conditional on student demographics and achievement, academic behaviors are several-fold more predictive than SEL skills for all long-run outcomes, and adding SEL skills to a model with academic behaviors improves the model's predictive power minimally. In addition, academic behaviors are particularly strong predictors for low-achieving students' long-run outcomes. Part-day absenteeism (as a result of class skipping) is the largest driver behind the strong predictive power of academic behaviors. Developing more nuanced behavioral measures in existing administrative data systems might be a fruitful strategy for schools whose intended goal centers on predicting students' educational attainment.

More →


Providing consistent, individualized feedback to teachers is essential for improving instruction but can be prohibitively resource-intensive in most educational contexts. We develop M-Powering Teachers, an automated tool based on natural language processing to give teachers feedback on their uptake of student contributions, a high-leverage dialogic teaching practice that makes students feel heard. We conduct a randomized controlled trial in an online computer science course (n=1,136 instructors), to evaluate the effectiveness of our tool. We find that M-Powering Teachers improves instructors’ uptake of student contributions by 13% and present suggestive evidence that it also improves students’ satisfaction with the course and assignment completion. These results demonstrate the promise of M-Powering Teachers to complement existing efforts in teachers’ professional development.

More →


Although learners are being connected 1:1 with instructors at an increasing scale, most of these instructors do not receive effective, consistent feedback to help them improved. We deployed M-Powering Teachers, an automated tool based on natural language processing to give instructors feedback on dialogic instructional practices —including their uptake of student contributions, talk time and questioning practices — in a 1:1 online learning context. We conducted a randomized controlled trial on Polygence, a re-search mentorship platform for high schoolers (n=414 mentors) to evaluate the effectiveness of the feedback tool. We find that the intervention improved mentors’ uptake of student contributions by 10%, reduced their talk time by 5% and improves student’s experi-ence with the program as well as their relative optimism about their academic future. These results corroborate existing evidence that scalable and low-cost automated feedback can improve instruction and learning in online educational contexts.

More →


Teachers’ sense-making of student behavior determines whether students get in trouble and are formally disciplined. Status categories, such as race, can influence perceptions of student culpability, but the degree to which teachers’ initial identification of student misbehavior exacerbates racial disproportionality in discipline receipt is unknown.This study provides the first systematic documentation of teachers’ use of office discipline Referrals (ODRs) in a large, diverse urban school district in California that specifies the identity of both the referred and referring individuals in all ODRs. We identify teachers exhibiting extensive referring behavior, or the top 5 percent referrers based on the number of ODRs they make in a given year and evaluate their contributions to disciplinary disparities. We find that “top referrers” effectively double the racial gaps in ODRs for both Black-White and Hispanic-White comparisons. These gaps are mainly driven by higher numbers of ODRs issued for Black and Hispanic students due to interpersonal offences and defiance, and also partially convert to racial gaps in suspensions. Both the level and racial compositions of the school sites where “top referrers” serve and their personal traits seem to explain some of their frequent referring behavior. Targeting supports and interventions to “top referrers” might afford an important opportunity to reduce racial disciplinary gaps

More →


Teachers affect a wide range of students’ educational and social outcomes, but how they contribute to students’ involvement in school discipline is less understood. We estimate the impact of teacher demographics and other observed qualifications on students’ likelihood of receiving a disciplinary referral. Using data that track all disciplinary referrals and the identity of both the referred and referring individuals from a large and diverse urban school district in California, we find students are about 0.2 to 0.5 percentage points (7% to 18%) less likely to receive a disciplinary referral from teachers of the same race or gender than from teachers of different demographic backgrounds. Students are also less likely to be referred by more experienced teachers and by teachers who hold either an English language learners or special education credential. These results are mostly driven by referrals for defiance and violence infractions, Black and Hispanic male students, and middle school students. While it is unclear whether these findings are due to variation in teachers’ effects on actual student behavior, variation in teachers’ proclivities to make disciplinary referrals, or a combination of the two, these results nonetheless suggest that teachers play a central role in the prevalence of, and inequities in, office referrals and subsequent student discipline.

More →


Jing Liu, Monica Lee.

Student absenteeism is often conceptualized and quantified in a static, uniform manner, providing an incomplete understanding of this important phenomenon. Applying growth curve models to detailed class-attendance data, we document that secondary school students' unexcused absences grow steadily throughout a school year and over grades, while the growth of excused absences remain essentially unchanged. Importantly, students starting the school year with a high number of unexcused absences, Black and Hispanic students, and low-income students accumulate unexcused absences at a significantly faster rate than their counterparts. Lastly, students with higher growth rates in unexcused absences consistently report lower perceptions of all aspects of school culture than their peers. Interventions targeting unexcused absences and/or improving school culture can be crucial to mitigating disengagement.

More →