When analyzing treatment effects on test scores, researchers face many choices and competing guidance for scoring tests and modeling results. This study examines the impact of scoring choices through simulation and an empirical application. Results show that estimates from multiple methods applied to the same data will vary because two-step models using sum or factor scores provide attenuated standardized treatment effects compared to latent variable models. This bias dominates any other differences between models or features of the data generating process, such as the use of scoring weights. An errors-in-variables (EIV) correction removes the bias from two-step models. An empirical application to data from a randomized controlled trial demonstrates the sensitivity of the results to model selection. This study shows that the psychometric principles most consequential in causal inference are related to attenuation bias rather than optimal scoring weights.
causal inference, latent variable models, factor analysis, psychometrics, educational measurement
Document Object Identifier (DOI)