Analyzing heterogeneous treatment effects plays a crucial role in understanding the impacts of educational interventions. A standard practice for heterogeneity analysis is to examine interactions between treatment status and pre-intervention participant char- acteristics, such as pretest scores, to identify how different groups respond to treatment. This study demonstrates that identical observed patterns of heterogeneity on test score outcomes can emerge from entirely distinct data-generating processes. Specifically, we describe scenarios in which treatment effect heterogeneity arises from either variation in treatment effects along a pre-intervention participant characteristic or from correlations between treatment effects and item easiness parameters. We demonstrate analytically and through simulation that these two scenarios cannot be distinguished if analysis is based on summary scores alone as such outcomes are insufficient to identify the relevant generating process. We then describe a novel approach that identifies the relevant data-generating process by leveraging item-level data. We apply our approach to a randomized trial of a reading intervention in second grade, and show that any apparent heterogeneity by pretest ability is driven by the correlation between treatment effect size and item easiness. Our results highlight the potential of employing measurement principles in causal analysis, beyond their common use in test construction.
causal inference, heterogeneous treatment effects, item response theory, psychometrics, educational measurement
Document Object Identifier (DOI)