The Sensitivity of Value-Added Estimates to Test Scoring Decisions

September 2025

Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the sensitivity of VA estimates to scoring method using empirical item response data from 18 education datasets. We show that VA estimates are frequently highly sensitive to scoring method, holding constant students and items. While the various test scores are highly correlated, on average, different scoring approaches result in VA percentile ranks that vary by over 20 points, and over 50% of teachers or schools are ranked in more than one quartile of the VA distribution. Dispersion in VA ranks is reduced with more complete item response data. We conclude that consideration of both measurement error and model uncertainty is necessary for appropriate interpretation of VAMs.

Keywords

value-added model, item response theory, test scoring, reliability, education policy

Education level

K-12 Education

Topics

Methods