Methods
Leveraging Item Parameter Drift to Assess Transfer Effects in Vocabulary Learning
Longitudinal models of individual growth typically emphasize between-person predictors of change but ignore how growth may vary within persons because each person contributes only one point at each time to the model. In contrast, modeling growth with multi-item assessments allows evaluation of… more →
Estimating Learning When Test Scores Are Missing: The Problem and Two Solutions
Longitudinal studies can produce biased estimates of learning if children miss tests. In an application to summer learning, we illustrate how missing test scores can create an illusion of large summer learning gaps when true gaps are close to zero. We demonstrate two methods that reduce bias by… more →
When Pell Today Doesn’t Mean Pell Tomorrow: Evaluating Aid Programs With Dynamic Eligibility
Generally, need-based financial aid improves students’ academic outcomes. However, the largest source of need-based grant aid in the United States, the Federal Pell Grant Program (Pell), has a mixed evaluation record. We assess the minimum Pell Grant in a regression discontinuity framework,… more →
Disparate Teacher Effects, Comparative Advantage, and Match Quality
Does student-teacher match quality exist? Prior work has documented large disparities in teachers' impacts across student types but has not distinguished between sorting and causal effects as the drivers of these disparities. I propose a disparate value-added model and derive a novel measure of… more →
Multiply by 37 (or Divide by 0.023): A Surprisingly Accurate Rule of Thumb for Converting Effect Sizes from Standard Deviations to Percentile Points
Educational researchers often report effect sizes in standard deviation units (SD), but SD effects are hard to interpret. Effects are easier to interpret in percentile points, but converting SDs to percentile points involves a calculation that is not transparent to educational stakeholders. We… more →
Experimental education research: clarifying why, how and when to use random assignment
Over the last twenty years, education researchers have increasingly conducted randomised experiments with the goal of informing the decisions of educators and policymakers. Such experiments have generally employed broad, consequential, standardised outcome measures in the hope that this would… more →
An Improved Method for Estimating School-Level Characteristics from Census Data
We propose a new method for estimating school-level characteristics from publicly available census data. We use a school’s location to impute its catchment area by aggregating the nearest n census block groups such that the number of school-aged children in those n block groups is just over the… more →
Implementation Matters: Generalizing Treatment Effects in Education
Targeted instruction is one of the most effective educational interventions in low- and middle-income countries, yet reported impacts vary by an order of magnitude. We study this variation by aggregating evidence from prior randomized trials across five contexts, and use the results to inform a… more →
A Global Regression Discontinuity Design: Theory and Application to Grade Retention Policies
We use a marginal treatment effect (MTE) representation of a fuzzy regression discontinuity setting to propose a novel estimator. The estimator can be thought of as extrapolating the traditional fuzzy regression discontinuity estimate or as an observational study that adjusts for endogenous… more →
Identification of Non-Additive Fixed Effects Models: Is the Return to Teacher Quality Homogeneous?
Panel or grouped data are often used to allow for unobserved individual heterogeneity in econometric models via fixed effects. In this paper, we discuss identification of a panel data model in which the unobserved heterogeneity both enters additively and interacts with treatment variables. We… more →
How Measurement Affects Causal Inference: Attenuation Bias is (Usually) More Important Than Scoring Weights
When analyzing treatment effects on test scores, researchers face many choices and competing guidance for scoring tests and modeling results. This study examines the impact of scoring choices through simulation and an empirical application. Results show that estimates from multiple methods applied… more →
Heterogeneity of item-treatment interactions masks complexity and generalizability in randomized controlled trials
Researchers use test outcomes to evaluate the effectiveness of education interventions across numerous randomized controlled trials (RCTs). Aggregate test data—for example, simple measures like the sum of correct responses—are compared across treatment and control groups to determine whether an… more →
Lottery-Based Evaluations of Early Education Programs: Opportunities and Challenges for Building the Next Generation of Evidence
Lottery-based identification strategies offer potential for generating the next generation of evidence on U.S.
Measuring returns to experience using supervisor ratings of observed performance: The case of classroom teachers
We study the returns to experience in teaching, estimated using supervisor ratings from classroom observations. We describe the assumptions required to interpret changes in observation ratings over time as the causal effect of experience on performance. We compare two difference-in-differences… more →
The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts
Classroom discourse is a core medium of instruction --- analyzing it can provide a window into teaching and learning as well as driving the development of new tools for improving instruction. We introduce the largest dataset of mathematics classroom transcripts available to researchers, and… more →
Estimating Treatment Effects with the Explanatory Item Response Model
This simulation study examines the characteristics of the Explanatory Item Response Model (EIRM) when estimating treatment effects when compared to classical test theory (CTT) sum and mean scores and item response theory (IRT)-based theta scores. Results show that the EIRM and IRT theta scores… more →
Rethinking Principal Effects on Student Outcomes
School principals are viewed as critical actors to improve student outcomes, but there remain important methodological questions about how to measure principals’ effects. We propose a framework for measuring principals’ contributions to student outcomes and apply it empirically using data from… more →
Modeling Item-Level Heterogeneous Treatment Effects with the Explanatory Item Response Model: Leveraging Online Formative Assessments to Pinpoint the Impact of Educational Interventions
Analyses that reveal how treatment effects vary allow researchers, practitioners, and policymakers to better understand the efficacy of educational interventions. In practice, however, standard statistical methods for addressing Heterogeneous Treatment Effects (HTE) fail to address the HTE that… more →
Correspondence Measures for Assessing Replication Success
Given recent evidence challenging the replicability of results in the social and behavioral sciences, critical questions have been raised about appropriate measures for determining replication success in comparing effect estimates across studies. At issue is the fact that… more →
Racial Category Usage in Education Research: Examining the Publications from AERA Journals
How scholars name different racial groups has powerful salience for understanding what researchers study. We explored how education researchers used racial terminology in recently published, high-profile, peer-reviewed studies. Our sample included all original empirical studies published in the… more →