Search EdWorkingPapers
Methodology, measurement and data
Displaying 11 - 20 of 182
Longitudinal models of individual growth typically emphasize between-person predictors of change but ignore how growth may vary within persons because each person contributes only one point at each time to the model. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally viewed as a nuisance under the label of “item parameter drift” (IPD) in the Item Response Theory literature, we argue that IPD may be of substantive interest if it reflects how learning manifests on different items or subscales at different rates. In this study, we present a novel application of the Explanatory Item Response Model (EIRM) to assess IPD in a causal inference context. Simulation results show that when IPD is not accounted for, both parameter estimates and their standard errors can be affected. We illustrate with an empirical application to the persistence of transfer effects from a content literacy intervention on vocabulary knowledge, revealing how researchers can leverage IPD to achieve a more fine-grained understanding of how vocabulary learning develops over time.
Letters of recommendation from school counselors are required to apply to many selective colleges and universities. Still, relatively little is known about how this non-standardized component may affect equity in admissions. We use cutting-edge natural language processing techniques to algorithmically analyze a national dataset of over 600,000 student applications and counselor recommendation letters submitted via the Common App platform. We examine how the length and topical content of letters (e.g., sentences about Personal Qualities, Athletics, Intellectual Promise, etc.) relate to student self-identified race/ethnicity, sex, and proxies for socioeconomic status. Paired with regression analyses, we explore whether demographic differences in letter characteristics persist when accounting for additional student, school, and counselor characteristics, as well as among letters written by the same counselor and among students with comparably competitive standardized test scores. We ultimately find large and noteworthy naïve differences in letter length and content across nearly all demographic groups, many in alignment with known inequities (e.g., many more sentences about Athletics among White and higher-SES students, longer letters and more sentences on Personal Qualities for private school students). However, these differences vary drastically based on the exact controls and comparison groups included – demonstrating that the ultimate implications of these letter differences for equity hinges on exactly how and when letters are used in admissions processes (e.g., are letters evaluated at face value across all students, or are they mostly compared to other letters from the same high school or counselor?). Findings do not point to a clear recommendation whether institutions should keep or discard letter requirements, but reflect the importance of reading letters and overall applications in the context of structural opportunity. We discuss additional implications and possible recommendations for college access and admissions policy/practice.
Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers’ expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that focuses on low-inference instructional practices, this paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instructional practices in two distinct educational settings: in-person K-12 classrooms and simulated performance tasks for pre-service teachers. This is also the first study that applies NLP to measure a teaching practice that has been demonstrated to be particularly effective for students with special needs. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings. Our results suggest that pretrained Language Models (PLMs) demonstrate performances comparable to the agreement level of human raters for variables that are more discrete and require lower inference, but their efficacy diminishes with more complex teaching practices. Interestingly, using only teachers’ utterances as input yields strong results for student-centered variables, alleviating common concerns over the difficulty of collecting and transcribing high-quality student speech data in in-person teaching settings. Our findings highlight both the potential and the limitations of current NLP techniques in the education domain, opening avenues for further exploration.
This paper studies the causal impacts of public universities on the outcomes of their marginally admitted students. I use administrative admission records spanning all 35 public universities in Texas, which collectively enroll 10 percent of American public university students, to systematically identify and employ decentralized cutoffs in SAT/ACT scores that generate discontinuities in admission and enrollment. The typical marginally admitted student completes an additional year of education in the four-year sector, is 12 percentage points more likely to earn a bachelor's degree, and eventually earns 5-10 percent more than their marginally rejected but otherwise identical counterpart. Marginally admitted students pay no additional tuition costs thanks to offsetting grant aid; cost-benefit calculations show internal rates of return of 19-23 percent for the marginal students themselves, 10-12 percent for society (which must pay for the additional education), and 3-4 percent for the government budget. Finally, I develop a method to disentangle separate effects for students on the extensive margin of the four-year sector versus those who would fall back to another four-year school if rejected. Substantially larger extensive margin effects drive the results.
While learning outcomes in low- and middle-income countries are generally at low levels, the degree to which students and schools more broadly within education systems lag behind grade-level proficiency can vary significantly. A substantial portion of existing literature advocates for aligning curricula closer to the proficiency level of the “median child” within each system. Yet, amidst considerable between-school heterogeneity in learning outcomes, choosing a single instructional level for the entire system may still leave behind those students in schools far from this level. Hence, establishing system-wide curriculum expectations in the presence of significant between-school heterogeneity poses a significant challenge for policymakers — especially as the issue of between-school heterogeneity has been relatively unexplored by researchers so far. This paper addresses the gap by leveraging a unique dataset on foundational literacy and numeracy outcomes, representative of six public educational systems encompassing over 900,000 enrolled children in South Asia and West Africa. With this dataset, we examine the current extent of between-school heterogeneity in learning outcomes, the potential predictors of this heterogeneity, and explore its potential implications for setting national curricula for different grade levels and subjects. Our findings reveal that between-school heterogeneity can indeed present both a severe pedagogical hindrance and challenges for policymakers, particularly in contexts with relatively higher levels of performance and in the higher grades. In response to meaningful between-system heterogeneity, we also demonstrate through simulation that a more nuanced, data-driven targeting of curricular expectations for different schools within a system could empower policymakers to effectively reach a broader spectrum of students through classroom instruction.
To understand the causes and consequences of polarized demand for government expenditure, we conduct three field experiments in the context of public higher education. The first two experiments study polarization in taxpayer demand. We provide information to shape beliefs about social returns on investment. Our treatments narrow the political partisan gap in ideal policies---a reduction in ideological polarization---by up to 32%, with differences in partisan reasoning as a key mechanism. Providing information also affects how people communicate their ideal policies to elected officials, increasing their propensity to write a (positive) letter to an official of the other party---a reduction in affective polarization. In the third experiment, we send these letters to a randomized subset of elected officials to study how policymakers respond to constituent demand. We find that officials who receive their constituents' demands engage more with higher education issues in our correspondences.
Educational researchers often report effect sizes in standard deviation units (SD), but SD effects are hard to interpret. Effects are easier to interpret in percentile points, but converting SDs to percentile points involves a calculation that is not transparent to educational stakeholders. We show that if the outcome variable is normally distributed, we can approximate the percentile-point effect simply by multiplying the SD effect by 37 (or, equivalently, dividing the SD effect by 0.027). For students in the middle three-fifths of a normal distribution, this rule of thumb is always accurate to within 1.6 percentile points for effect sizes of up to 0.8 SD. Two examples show that the rule can be just as accurate for empirical effects from real studies. Applying the rule to empirical benchmarks, we find that the least effective third of educational interventions raise scores by 0 to 2 percentile points; the middle third raise scores by 2 to 7 percentile points; and the most effective third raise scores by more than 7 percentile points.
Children's approaches to learning (AtL) are widely recognized as a critical predictor of educational outcomes, especially in early childhood. Nevertheless, there remains a dearth of understanding regarding the dimensionality of AtL, the reciprocal dynamics between AtL and learning outcomes, and how AtL operates in non-Western contexts. This paper aims to extend the existing AtL literature by both conceptually and empirically investigating the dimensionality of the AtL scale of the International Development and Early Learning Assessment (IDELA) – a globally used measure of early childhood development – based on data from Ghanaian children newly enrolled in formal schooling. Additionally, our research explores reciprocal relationships between AtL subconstructs and academic skills over time. Our analysis identifies two dimensions within the IDELA AtL scale: Self-Regulation (SR) and Motivation. We found that children with higher levels of SR early in schooling demonstrated better literacy and numeracy skills in later grades compared to their peers with low early SR, whereas children's motivation did not predict subsequent literacy and numeracy skills. This study enhances understanding of AtL in non-Western contexts, with implications for culturally appropriate support for children’s engagement in learning.
We study how colleges shape their students' voting habits by linking millions of SAT takers to their college-enrollment and voting histories. To begin, we show that the fraction of students from a particular college who vote varies systematically by the college's attributes (e.g. increasing with selectivity) but also that seemingly similar colleges can have markedly different voting rates. Next, after controlling for students' college application portfolios and pre-college voting behavior, we find that attending a college with a 10 percentage-point higher voting rate increases entrants' probability of voting by 4 percentage points (10 percent). This effect arises during college, persists after college, and is almost entirely driven by higher voting-rate colleges making new voters. College peers' initial voting propensity plays no discernible role.
We use a marginal treatment effect (MTE) representation of a fuzzy regression discontinuity setting to propose a novel estimator. The estimator can be thought of as extrapolating the traditional fuzzy regression discontinuity estimate or as an observational study that adjusts for endogenous selection into treatment using information at the discontinuity. We show in a frequentest framework that it is consistent under weaker assumptions than existing approaches and then discuss conditions in a Bayesian framework under which it can be considered the posterior mean given the observed conditional moments. We then use this approach to examine the effects of early grade retention. We show that the benefits of early grade retention policies are larger for students with lower baseline achievement and smaller for low-performing students who are exempt from retention. These findings imply that (1) the benefits of early grade retention policies are larger than have been estimated using traditional fuzzy regression discontinuity designs but that (2) retaining additional students would have a limited effect on student outcomes.