The Annenberg Institute at Brown University offers this national working paper series to provide open access to high-quality papers from multiple disciplines and from multiple universities and research organizations on a wide variety of topics related to education. EdWorkingPapers focuses particularly on research with strong implications for education policy. EdWorkingPapers circulates papers prior to publication for comment and discussion; these papers have not gone through a peer review processes. Contributors can update papers to provide readers with the most up-to-date findings.
Search EdWorkingPapers by author, title, or keywords.
When analyzing treatment effects on test score data, education researchers face many choices for scoring tests and modeling results. This study examines the impact of those choices through Monte Carlo simulation and an empirical application. Results show that estimates from multiple analytic methods applied to the same data will vary because, as predicted by Classical Test Theory, two-step models using sum or IRT-based scores provide downwardly biased standardized treatment effect coefficients compared to latent variable models. This bias dominates any other differences between models or features of the data generating process, such as the variability of item discrimination parameters. An errors-in-variables (EIV) correction successfully removes the bias from two-step models. Model performance is not substantially different in terms of precision, standard error calibration, false positive rates, or statistical power. An empirical application to data from a randomized controlled trial of a second-grade literacy intervention demonstrates the sensitivity of the results to model selection and tradeoffs between model selection and interpretation. This study shows that the psychometric principles most consequential in causal inference are related to attenuation bias rather than optimal scoring weights.