Search EdWorkingPapers

Search EdWorkingPapers by author, title, or keywords.

Methodology, measurement and data

Maya Kaul.

Teachers’ professional identities are the foundation of their practice. Previous scholarship has largely overlooked the extent to which the broader reform culture shapes teachers’ professional identities. In this study, I draw on survey data from 950 teachers across four US states (California, New York, Florida, and Texas) to examine the extent to which teachers’ professional identities are associated with what I term “institutionalized conceptions” of their roles. Across diverse state policy contexts, I find that teachers draw upon a shared set of institutionalized conceptions of their roles, which are associated with their professional identities. The findings suggest that the taken-for-granted ways society frames teaching may be associated with dimensions of teachers’ professional identity, such as self-efficacy and professional commitment.

More →


Brian Holzman, Horace Duffy.

As states incorporate measures of college readiness into their accountability systems, school and district leaders need effective strategies to identify and support students at risk of not enrolling in college. Although there is an abundant literature on early warning indicators for high school dropout, fewer studies focus on indicators for college enrollment, especially those that are simple to calculate and easy for practitioners to use. This study explores three potential indicators of college readiness that educational leaders may consider using as part of an early warning system for college enrollment. Using district administrative data, our analysis shows that an indicator based on attendance, grades, and advanced course-taking is slightly more effective at predicting college enrollment than indicators based on course failures or standardized test scores. However, the performance of these indicators varies across different student demographic and socioeconomic subgroups, highlighting the limitations of these measures and pointing to areas where they may need to be supplemented with contextual information. Through event history analysis, we demonstrate that the ninth grade is a particularly challenging year for students, especially those who are male, Black, Hispanic, or economically disadvantaged. These results suggest that educational leaders ought to consider identifying and targeting students at risk of not attending college with additional resources and support during the freshman year of high school.

More →


Kelli A. Bird, Benjamin L. Castleman, Yifeng Song, Renzhe Yu.

Colleges have increasingly turned to data science applications to improve student outcomes. One prominent application is to predict students’ risk of failing a course. In this paper, we investigate whether incorporating data from learning management systems (LMS)--which captures detailed information on students’ engagement in course activities--increases the accuracy of predicting student success beyond using just administrative data alone. We use data from the Virginia Community College System to build random forest models based on student type (new versus returning) and data source (administrative-only, LMS-only, or full data). We find that among returning college students, models that use administrative-only outperform models that use LMSonly. Combining the two types of data results in minimal increased accuracy. Among new students, LMS-only models outperform administrative-only models, and accuracy is significantly higher when both types of predictors are used. This pattern of results reflects the fact that community college administrative data contains little information about new students. Within the LMS data, we find that LMS data pertaining to students’ engagement during the first part of the course has the most predictive value.

More →


Ariana Audisio, Rebecca Taylor-Perryman, Tim Tasker, Matthew P. Steinberg.

Teachers are the most important school-specific factor in student learning. Yet, little evidence exists linking teacher professional development programs and the strategies or activities that comprise them to student achievement. In this paper, we examine a fellowship model for professional development designed and implemented by Leading Educators, a national nonprofit organization that aims to bridge research and practice to improve instructional quality and accelerate learning across school systems. During the 2015-16 and 2016-17 school years, Leading Educators conducted its fellowship program for two cohorts of instructional leaders, such as department chairs, mentor teachers, instructional coaches, and assistant principals, to provide these educators ongoing, collaborative, job-embedded professional development and to improve student achievement. Relying on quasi-experimental methods, we find that a school’s participation in the fellowship program significantly increased student proficiency rates in English language arts and math on state achievement exams. The positive impact was concentrated in the first cohort and in just one of three regions, and approximately 80 percent of treated schools were charters. Student achievement benefitted from a more sustained duration of participation in the fellowship program, varied depending on the share of a school’s educators who participated in the fellowship, and differed based on whether fellows independently selected into the program or were appointed to participate by their school leaders. Taken together, findings from this paper should inform professional learning organizations, schools, and policymakers on the design, implementation, and impact of educator professional development.

More →


Joshua B. Gilbert, James S. Kim, Luke W. Miratrix.

Longitudinal models of individual growth typically emphasize between-person predictors of change but ignore how growth may vary within persons because each person contributes only one point at each time to the model. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally viewed as a nuisance under the label of “item parameter drift” (IPD) in the Item Response Theory literature, we argue that IPD may be of substantive interest if it reflects how learning manifests on different items or subscales at different rates. In this study, we present a novel application of the Explanatory Item Response Model (EIRM) to assess IPD in a causal inference context. Simulation results show that when IPD is not accounted for, both parameter estimates and their standard errors can be affected. We illustrate with an empirical application to the persistence of transfer effects from a content literacy intervention on vocabulary knowledge, revealing how researchers can leverage IPD to achieve a more fine-grained understanding of how vocabulary learning develops over time.

More →


Brian Heseung Kim, Julie J. Park, Pearl Lo, Dominique J. Baker, Nancy Wong, Stephanie Breen, Huong Truong, Jia Zheng, Kelly Ochs Rosinger, OiYan Poon.

Letters of recommendation from school counselors are required to apply to many selective colleges and universities. Still, relatively little is known about how this non-standardized component may affect equity in admissions. We use cutting-edge natural language processing techniques to algorithmically analyze a national dataset of over 600,000 student applications and counselor recommendation letters submitted via the Common App platform. We examine how the length and topical content of letters (e.g., sentences about Personal Qualities, Athletics, Intellectual Promise, etc.) relate to student self-identified race/ethnicity, sex, and proxies for socioeconomic status. Paired with regression analyses, we explore whether demographic differences in letter characteristics persist when accounting for additional student, school, and counselor characteristics, as well as among letters written by the same counselor and among students with comparably competitive standardized test scores. We ultimately find large and noteworthy naïve differences in letter length and content across nearly all demographic groups, many in alignment with known inequities (e.g., many more sentences about Athletics among White and higher-SES students, longer letters and more sentences on Personal Qualities for private school students). However, these differences vary drastically based on the exact controls and comparison groups included – demonstrating that the ultimate implications of these letter differences for equity hinges on exactly how and when letters are used in admissions processes (e.g., are letters evaluated at face value across all students, or are they mostly compared to other letters from the same high school or counselor?). Findings do not point to a clear recommendation whether institutions should keep or discard letter requirements, but reflect the importance of reading letters and overall applications in the context of structural opportunity. We discuss additional implications and possible recommendations for college access and admissions policy/practice.

More →


Paiheng Xu, Jing Liu, Nathan Jones, Julie Cohen, Wei Ai.

Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers’ expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that focuses on low-inference instructional practices, this paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instructional practices in two distinct educational settings: in-person K-12 classrooms and simulated performance tasks for pre-service teachers. This is also the first study that applies NLP to measure a teaching practice that has been demonstrated to be particularly effective for students with special needs. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings. Our results suggest that pretrained Language Models (PLMs) demonstrate performances comparable to the agreement level of human raters for variables that are more discrete and require lower inference, but their efficacy diminishes with more complex teaching practices. Interestingly, using only teachers’ utterances as input yields strong results for student-centered variables, alleviating common concerns over the difficulty of collecting and transcribing high-quality student speech data in in-person teaching settings. Our findings highlight both the potential and the limitations of current NLP techniques in the education domain, opening avenues for further exploration.

More →


Jack Mountjoy.

This paper studies the causal impacts of public universities on the outcomes of their marginally admitted students. I use administrative admission records spanning all 35 public universities in Texas, which collectively enroll 10 percent of American public university students, to systematically identify and employ decentralized cutoffs in SAT/ACT scores that generate discontinuities in admission and enrollment. The typical marginally admitted student completes an additional year of education in the four-year sector, is 12 percentage points more likely to earn a bachelor's degree, and eventually earns 5-10 percent more than their marginally rejected but otherwise identical counterpart. Marginally admitted students pay no additional tuition costs thanks to offsetting grant aid; cost-benefit calculations show internal rates of return of 19-23 percent for the marginal students themselves, 10-12 percent for society (which must pay for the additional education), and 3-4 percent for the government budget. Finally, I develop a method to disentangle separate effects for students on the extensive margin of the four-year sector versus those who would fall back to another four-year school if rejected. Substantially larger extensive margin effects drive the results.

More →


Daniel Rodriguez-Segura, Savannah Tierney.

While learning outcomes in low- and middle-income countries are generally at low levels, the degree to which students and schools more broadly within education systems lag behind grade-level proficiency can vary significantly. A substantial portion of existing literature advocates for aligning curricula closer to the proficiency level of the “median child” within each system. Yet, amidst considerable between-school heterogeneity in learning outcomes, choosing a single instructional level for the entire system may still leave behind those students in schools far from this level. Hence, establishing system-wide curriculum expectations in the presence of significant between-school heterogeneity poses a significant challenge for policymakers — especially as the issue of between-school heterogeneity has been relatively unexplored by researchers so far. This paper addresses the gap by leveraging a unique dataset on foundational literacy and numeracy outcomes, representative of six public educational systems encompassing over 900,000 enrolled children in South Asia and West Africa. With this dataset, we examine the current extent of between-school heterogeneity in learning outcomes, the potential predictors of this heterogeneity, and explore its potential implications for setting national curricula for different grade levels and subjects. Our findings reveal that between-school heterogeneity can indeed present both a severe pedagogical hindrance and challenges for policymakers, particularly in contexts with relatively higher levels of performance and in the higher grades. In response to meaningful between-system heterogeneity, we also demonstrate through simulation that a more nuanced, data-driven targeting of curricular expectations for different schools within a system could empower policymakers to effectively reach a broader spectrum of students through classroom instruction.

More →


Reuben Hurst, Andrew Simon, Michael Ricks.

To understand the causes and consequences of polarized demand for government expenditure, we conduct three field experiments in the context of public higher education. The first two experiments study polarization in taxpayer demand. We provide information to shape beliefs about social returns on investment. Our treatments narrow the political partisan gap in ideal policies---a reduction in ideological polarization---by up to 32%, with differences in partisan reasoning as a key mechanism. Providing information also affects how people communicate their ideal policies to elected officials, increasing their propensity to write a (positive) letter to an official of the other party---a reduction in affective polarization. In the third experiment, we send these letters to a randomized subset of elected officials to study how policymakers respond to constituent demand. We find that officials who receive their constituents' demands engage more with higher education issues in our correspondences.

More →