Search EdWorkingPapers

Search EdWorkingPapers by author, title, or keywords.

Methodology, measurement and data

Matthew A. Lenard, Mikko Silliman.

We study the effects of informal social interactions on academic achievement and behavior using idiosyncratic variation in peer groups stemming from changes in bus routes across elementary, middle, and high school. In early grades, a one standard-deviation change in the value-added of same-grade bus peers corresponds to a 0.01 SD change in academic performance and a 0.03 SD change in behavior; by high school, these magnitudes grow to 0.04 SD and 0.06 SD. These findings suggest that student interactions outside the classroom—especially in adolescence—may be an important factor in the education production function.

More →


Luke Keele, Matthew Lenard, Lindsay Page.

In education settings, treatments are often non-randomly assigned to clusters, such as schools or classrooms, while outcomes are measured for students. This research design is called the clustered observational study (COS). We examine the consequences of common support violations in the COS context. Common support violations occur when the covariate distributions of treated and control units do not overlap. Such violations are likely to occur in a COS, especially with a small number of treated clusters. One common technique for dealing with common support violations is trimming treated units. We demonstrate how this practice can yield nonsensical results in some COSs. More specifically, we show how trimming the data can result in an uninterpretable estimand. We use data on Catholic schools to illustrate concepts throughout.

More →


Michael Gilraine, Jeffrey Penney.

An administrative rule allowed students who failed an exam to retake it shortly after, triggering strong `teach to the test' incentives to raise these students' test scores for the retake. We develop a model that accounts for truncation and find that these students score 0.14 standard deviations higher on the retest. Using a regression discontinuity design, we estimate thirty percent of these gains persist to the following year. These results provide evidence that test-focused instruction or `cramming' raises contemporaneous performance, but a large portion of these gains fade-out. Our findings highlight that persistence should be accounted for when comparing educational interventions.

More →


Ishtiaque Fazlul, Todd R. Jones, Jonathan Smith.

Millions of high school students who take an Advanced Placement (AP) course in one of over 30 subjects can earn college credit by performing well on the corresponding AP exam. Using data from four metro-Atlanta public school districts, we find that 15 percent of students’ AP courses do not result in an AP exam. We predict that up to 32 percent of the AP courses that do not result in an AP exam would result in a score of 3 or higher, which generally commands college credit at colleges and universities across the United States. Next, we examine disparities in AP exam-taking rates by demographics and course taking patterns.  Most immediately policy relevant, we find evidence consistent with the positive impact of school district exam subsidies on AP exam-taking rates. In fact, students on free and reduced-price lunch (FRL) in the districts that provide a higher subsidy to FRL students than non-FRL students are more likely to take an AP exam than their non-FRL counterparts, after controlling for demographic and academic covariates.

More →


Kelli A. Bird, Benjamin L. Castleman, Zachary Mabel, Yifeng Song.

Colleges have increasingly turned to predictive analytics to target at-risk students for additional support. Most of the predictive analytic applications in higher education are proprietary, with private companies offering little transparency about their underlying models. We address this lack of transparency by systematically comparing two important dimensions: (1) different approaches to sample and variable construction and how these affect model accuracy; and (2) how the selection of predictive modeling approaches, ranging from methods many institutional researchers would be familiar with to more complex machine learning methods, impacts model performance and the stability of predicted scores. The relative ranking of students’ predicted probability of completing college varies substantially across modeling approaches. While we observe substantial gains in performance from models trained on a sample structured to represent the typical enrollment spells of students and with a robust set of predictors, we observe similar performance between the simplest and most complex models.

More →


David M. Houston, Michael B. Henderson, Paul E. Peterson, Martin R. West.

Do Americans hold a consistent set of opinions about their public schools and how to improve them? From 2013 to 2018, over 5,000 unique respondents participated in more than one consecutive iteration of the annual, nationally representative Education Next poll, offering an opportunity to examine individual-level attitude stability on education policy issues over a six-year period. The proportion of participants who provide the same response to the same question over multiple consecutive years greatly exceeds the amount expected to occur by chance alone. We also find that teachers offer more consistent responses than their non-teaching peers. By contrast, we do not observe similar differences in attitude stability between parents of school-age children and their counterparts without children.

More →


Peter Q. Blair, Papia Debroy, Justin Heck.

Over the past four decades, income inequality grew significantly between workers with bachelor’s degrees and those with high school diplomas (often called “unskilled”). Rather than being unskilled, we argue that these workers are STARs because they are skilled through alternative routes—namely their work experience. Using the skill requirements of a worker’s current job as a proxy of their actual skill, we find that though both groups of workers make transitions to occupations requiring similar skills to their previous occupations, workers with bachelor’s degrees have dramatically better access to higher wage occupations where the skill requirements exceed the workers’ observed skill. This measured opportunity gap offers a fresh explanation of income inequality by degree status and reestablishes the important role of on-the-job-training in human capital formation.

More →


Dorottya Demszky, Jing Liu, Zid Mancenido, Julie Cohen, Heather C. Hill, Dan Jurafsky, Tatsunori Hashimoto.

In conversation, uptake happens when a speaker builds on the contribution of their interlocutor by, for example, acknowledging, repeating or reformulating what they have said. In education, teachers' uptake of student contributions has been linked to higher student achievement. Yet measuring and improving teachers' uptake at scale is challenging, as existing methods require expensive annotation by experts. We propose a framework for computationally measuring uptake, by (1) releasing a dataset of student-teacher exchanges extracted from US math classroom transcripts annotated for uptake by experts; (2) formalizing uptake as pointwise Jensen-Shannon Divergence (pJSD), estimated via next utterance classification; (3) conducting a linguistically-motivated comparison of different unsupervised measures and (4) correlating these measures with educational outcomes. We find that although repetition captures a significant part of uptake, pJSD outperforms repetition-based baselines, as it is capable of identifying a wider range of uptake phenomena like question answering and reformulation. We apply our uptake measure to three different educational datasets with outcome indicators. Unlike baseline measures, pJSD correlates significantly with instruction quality in all three, providing evidence for its generalizability and for its potential to serve as an automated professional development tool for teachers.

More →


Daniel Kreisman, Jonathan Smith, Bondi Arifin.

How do college non-completers list schooling on their resumes? The negative signal of not completing might outweigh the positive signal of attending but not persisting. If so, job-seekers might hide non-completed schooling on their resumes. To test this we match resumes from an online jobs board to administrative educational records. We find that fully one in three job-seekers who attended college but did not earn a degree omit their only post-secondary schooling from their resumes. We further show that these are not casual omissions but are strategic decisions systematically related to schooling characteristics, such as selectivity and years of enrollment. We also find evidence of lying, and show which degrees listed on resumes are most likely untrue. Lastly, we discuss implications. We show not only that this implies a commonly held assumption, that employers perfectly observe schooling, does not hold, but also that we can learn about which college experiences students believe are most valued by employers.

More →


Sophie Litschwartz, Luke W. Miratrix.

In multisite experiments, we can quantify treatment effect variation with the cross-site treatment effect variance. However, there is no standard method for estimating cross-site treatment effect variance in multisite regression discontinuity designs (RDD). This research rectifies this gap in the literature by systematically exploring and evaluating methods for estimating the cross-site treatment effect variance in multisite RDDs. Specifically, we formalize a fixed intercepts/random coefficients (FIRC) RDD model and develop a random effects meta-analysis (Meta) RDD model for estimating cross-site treatment effect variance. We find that a restricted FIRC model works best when the running variables' relationship to the outcome is stable across sites but can be biased otherwise. In those instances, we recommend using either the unrestricted FIRC model or the meta-analysis model; with the unrestricted FIRC model generally performing better when the average number of in-bandwidth observations is less than 120 and the meta-analysis model performing better when the average number of in-bandwidth observations is above 120. We apply our models to a high school exit exam policy in Massachusetts that required students who passed the high school exit exam but were still determined to be nonproficient to complete an ``Education Proficiency Plan" (EPP). We find the EPP policy had a positive local average treatment effect on whether students completed a math course their senior year on average across sites, but that the impact varied enough such that a third of schools could have had a negative impact.

More →