Search and Filter
Assessment
ChatGPT vs. Machine Learning: Assessing the Efficacy and Accuracy of Large Language Models for Automated Essay Scoring
Topics: MethodsAutomated Essay Scoring (AES) is a critical tool in education that aims to enhance the efficiency and objectivity of educational assessments. Recent advancements in Large Language Models (LLMs), such as ChatGPT, have sparked interest in their potential for AES. However, comprehensive comparisons… more →
The reliability of classroom observations and student surveys in non-research settings: Evidence from Argentina
Topics: Teacher and Leader DevelopmentThere is a growing consensus on the need to measure teaching effectiveness using multiple instruments. Yet, guidance on how to achieve reliable ratings derives largely from formal research in high-income countries. We study the reliability of classroom observations and student surveys conducted… more →
What is the impact of changing schools on the academic outcomes of elementary and middle school students?
Topics: Student LearningWe study how different kinds of school changes shape achievement in grades 4–8 using data from six California districts (2016–17 through 2019–20). We estimate the effects of structural (promotional), nonstructural summer, and midyear moves and find that structural and summer moves have near-zero… more →
Beg to DIFfer: Resolving Statistical Complications of Intersectional DIF Analyses
Topics: MethodsTags: Assessment, EquityModern test developers conduct differential item functioning (DIF) analyses to ensure fairness in educational and psychological testing. To address previously unrecognized biases, researchers have recently demonstrated the importance of conducting intersectional DIF analyses that attend to the… more →
Controlling For Measurement Error in Evaluation Models When Treatment Group Assignment is Based on Noisy Measures: Evaluation of an Achievement Gap-Closing Initiative
Topics: MethodsTags: Assessment, EquityThis paper develops new models to evaluate the effects of interventions and intervention-by-site heterogeneity when treatment group assignment is based on a fallible variable and the outcome of interest is determined in part by the corresponding true control variables (measured without error).… more →
Labor supply, learning time, and the efficiency of school spending: Evidence from school finance reforms
Topics: Policy, Politics, and GovernanceDoes school spending raise achievement? I show that effects, benchmarked by schools’ daily value added, are one-tenth to one-third as large as spending growth. Using school finance reforms for identification, I show that schools did not raise quality measured by value added. Instead, schools… more →
Parent Perspectives on School Choice: Experimental Evidence from a Nationally Representative Sample
Topics: School ChoiceParental attitudes and perspectives of student “success” will likely drive their educational choices, whether residentially assigned district public schools, alternative public schools, private schools, or homeschooling. However, little research has examined the importance of these attitudes on… more →
High School Equivalency Credentialing and Post-Secondary Success: Pre-Registered Quasi-Experimental Evidence from the GED® Test
For the over 24 million American adults who do not hold a traditional high school diploma, high school equivalency (HSE) credentials represent the primary “second-chance” pathway to many careers or educational opportunities. This project uses current, representative data to assess whether, how,… more →
The Sensitivity of Value-Added Estimates to Test Scoring Decisions
Topics: MethodsTags: AssessmentValue-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may… more →
Testing Away from One's Own School: Exam Location and Performance in High-Stakes Exams
High-stakes exams are often administered at designated test centers, requiring many students to test in unfamiliar environments. We investigate whether such arrangements impact students' test performance and, by extension, access to educational opportunities.
Bridging Literacy Gaps: The Impact of AI-Driven Personalised Learning on Reading Skills and Educational Equity
Topics: Student LearningTags: Assessment, EquityPersistent literacy skills deficits hinder educational attainment, limit labour market opportunities, and exacerbate socioeconomic inequalities. This paper evaluates the causal effect of an AI-driven Computer-Assisted Learning (CAL) program implemented by the Government of Madrid, which features… more →
Puzzling Over Declining Academic Achievement
Topics: Student LearningTags: Assessment, School reformMany are concerned about the large decline in K-12 student achievement since 2019. And rightly so, given what it signals about student learning and later life outcomes. Less noted is the pre-pandemic sustained decline in student achievement growth that followed 30 years of increases. We examine… more →
Item-Level Heterogeneity in Value Added Models: Implications for Reliability, Cross-Study Comparability, and Effect Sizes
Topics: MethodsTags: Assessment, EfficacyValue added models (VAMs) attempt to estimate the causal effects of teachers and schools on student test scores. We apply Generalizability Theory to show how estimated VA effects depend upon the selection of test items. Standard VAMs estimate causal effects on the items that are included on the… more →
Disparate Teacher Effects, Comparative Advantage, and Match Quality
Topics: MethodsDoes student-teacher match quality exist? While prior research documents disparities in teachers' impacts across student types, it has not distinguished between sorting and causal effects as the drivers of these disparities. I develop a flexible disparate value-added model (DVA) and introduce a… more →
The Learning Crisis: Three Years After COVID-19
Tomasz Gajderowicz, Maciej Jakubowski, Alec Kennedy, Christian Christrup Kjeldsen, Harry A. Patrinos, Rolf Strietholt.Topics: Student LearningThe COVID-19 pandemic caused widespread disruptions to education, with school closures affecting over one billion children. These closures, aimed at reducing virus transmission, resulted in significant learning losses, particularly in mathematics and science. Using data from TIMSS 2023, which… more →
Unequal Access: How Public Library Closures Affect Educational Performance
Topics: Families and CommunitiesLocal public institutions, such as public libraries, offer access to low-cost educational resources, potentially mitigating human capital investment disparities. However, from 2008 to 2019, 766 public library outlets closed across the US, reducing access to these critical resources. This study… more →
Combining Early Grade Assessments to Study Literacy Skills: Addressing the Variability in Tests Taken across Schools and Students
Topics: MethodsThere is considerable variability in the literacy assessments taken in Kindergarten through second grade, across schools and between multilingual learners and other students, and within students over time. This makes it difficult to study changes in students’ acquisition of ELA skills in these… more →
Results from NCME Survey on Revisions to the Standards for Educational and Psychological Testing
Doris Zahner, Jeffrey T. Steedle, James G. Soland, Catherine Welch, Qi Qin, Kathryn Thompson, Richard Phelps.Tags: AssessmentThe Standards for Educational and Psychological Testing have served as a cornerstone for best practices in assessment. As the field evolves, so must these standards, with regular revisions ensuring they reflect current knowledge and practice. The National Council on Measurement in Education (… more →
Addressing Threats to Validity in Supervised Machine Learning: A Framework and Best Practices for Education Researchers
Topics: MethodsGiven the rapid adoption of machine learning methods by education researchers, and growing acknowledgement of their inherent risks, there is an urgent need for tailored methodological guidance on how to improve and evaluate the validity of inferences drawn from these methods. Drawing upon an… more →
Mechanisms of Effect Size Differences Between Researcher Developed and Independently Developed Outcomes: A Meta-Analysis of Item-Level Data
Topics: MethodsDifferences in effect sizes between researcher developed (RD) and independently developed (ID) outcome measures are widely documented but poorly understood in education research. We conduct a meta-analysis using item-level outcome data to test potential mechanisms that explain differences in… more →