Methods
A Quantitative Study of Mathematical Language in Upper Elementary Classrooms
This study provides the first large-scale quantitative exploration of mathematical language use in upper elementary U.S. classrooms. Our approach employs natural language processing techniques to describe variation in teachers’ and students’ use of mathematical language in 1,657 fourth and fifth… more →
Scaffolding Middle-School Mathematics Curricula With Large Language Models
Despite well-designed curriculum materials, teachers often face challenges implementing them due to diverse classroom needs. This paper investigates whether large language models (LLMs) can support middle school math teachers by helping create highquality curriculum scaffolds, which we define as… more →
STEM teacher workforce in high-need schools resilient despite shrinking supply and increasing demand
The teacher workforce in science, technology, engineering, and math (STEM) has been a perpetual weak spot in public schools’ teaching rosters. Prior reports show the pipeline of new STEM teachers into the profession is weak while demand for instruction in STEM fields continues to grow. This… more →
The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education
Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers’ expertise and idiosyncratic factors, preventing teachers from getting timely and… more →
The Effects of In-School Virtual Tutoring on Student Reading Development: Evidence from a Short-Cycle Randomized Controlled Trial
This paper describes a 12-week cluster randomized controlled trial that examined the efficacy of BookNook, a virtual tutoring platform focused on reading. Cohorts of first- through fourth-grade students attending six Rocketship public charter schools in Northern California were randomly assigned… more →
Does One Plus One Always Equal Two? Examining Complementarities in Educational Interventions
Public policies targeting individuals based on need often impose disproportionate burden on communities that lack the resources to implement these policies effectively. In an elementary school setting, I examine whether community-level interventions focusing on similar needs and providing… more →
Disentangling Person-Dependent and Item-Dependent Causal Effects: Applications of Item Response Theory to the Estimation of Treatment Effect Heterogeneity
Analyzing heterogeneous treatment effects (HTE) plays a crucial role in understanding the impacts of educational interventions.
Leveraging Item Parameter Drift to Assess Transfer Effects in Vocabulary Learning
Longitudinal models of individual growth typically emphasize between-person predictors of change but ignore how growth may vary within persons because each person contributes only one point at each time to the model. In contrast, modeling growth with multi-item assessments allows evaluation of… more →
Estimating Learning When Test Scores Are Missing: The Problem and Two Solutions
Longitudinal studies can produce biased estimates of learning if children miss tests. In an application to summer learning, we illustrate how missing test scores can create an illusion of large summer learning gaps when true gaps are close to zero. We demonstrate two methods that reduce bias by… more →
When Pell Today Doesn’t Mean Pell Tomorrow: Evaluating Aid Programs With Dynamic Eligibility
Generally, need-based financial aid improves students’ academic outcomes. However, the largest source of need-based grant aid in the United States, the Federal Pell Grant Program (Pell), has a mixed evaluation record. We assess the minimum Pell Grant in a regression discontinuity framework,… more →
Disparate Teacher Effects, Comparative Advantage, and Match Quality
Does student-teacher match quality exist? Prior work has documented large disparities in teachers' impacts across student types but has not distinguished between sorting and causal effects as the drivers of these disparities. I propose a disparate value-added model and derive a novel measure of… more →
Multiply by 37 (or Divide by 0.023): A Surprisingly Accurate Rule of Thumb for Converting Effect Sizes from Standard Deviations to Percentile Points
Educational researchers often report effect sizes in standard deviation units (SD), but SD effects are hard to interpret. Effects are easier to interpret in percentile points, but converting SDs to percentile points involves a calculation that is not transparent to educational stakeholders. We… more →
Experimental education research: clarifying why, how and when to use random assignment
Over the last twenty years, education researchers have increasingly conducted randomised experiments with the goal of informing the decisions of educators and policymakers. Such experiments have generally employed broad, consequential, standardised outcome measures in the hope that this would… more →
An Improved Method for Estimating School-Level Characteristics from Census Data
We propose a new method for estimating school-level characteristics from publicly available census data. We use a school’s location to impute its catchment area by aggregating the nearest n census block groups such that the number of school-aged children in those n block groups is just over the… more →
Implementation Matters: Generalizing Treatment Effects in Education
Targeted instruction is one of the most effective educational interventions in low- and middle-income countries, yet reported impacts vary by an order of magnitude. We study this variation by aggregating evidence from prior randomized trials across five contexts, and use the results to inform a… more →
A Global Regression Discontinuity Design: Theory and Application to Grade Retention Policies
We use a marginal treatment effect (MTE) representation of a fuzzy regression discontinuity setting to propose a novel estimation approach. The estimator can be thought of as extrapolating a traditional fuzzy regression discontinuity estimate or as an observational study that adjusts for… more →
Identification of Non-Additive Fixed Effects Models: Is the Return to Teacher Quality Homogeneous?
Panel or grouped data are often used to allow for unobserved individual heterogeneity in econometric models via fixed effects. In this paper, we discuss identification of a panel data model in which the unobserved heterogeneity both enters additively and interacts with treatment variables. We… more →
How Measurement Affects Causal Inference: Attenuation Bias is (Usually) More Important Than Scoring Weights
When analyzing treatment effects on test scores, researchers face many choices and competing guidance for scoring tests and modeling results. This study examines the impact of scoring choices through simulation and an empirical application. Results show that estimates from multiple methods applied… more →
Heterogeneity of item-treatment interactions masks complexity and generalizability in randomized controlled trials
Researchers use test outcomes to evaluate the effectiveness of education interventions across numerous randomized controlled trials (RCTs). Aggregate test data—for example, simple measures like the sum of correct responses—are compared across treatment and control groups to determine whether an… more →
Lottery-Based Evaluations of Early Education Programs: Opportunities and Challenges for Building the Next Generation of Evidence
Lottery-based identification strategies offer potential for generating the next generation of evidence on U.S.