TY - JOUR AB - Researchers across many fields have called for greater attention to heterogeneity of treatment effects—shifting focus from the average effect to variation in effects between different treatments, studies, or subgroups. True heterogeneity is important, but many reports of heterogeneity have proved to be false, non-replicable, or exaggerated. In this review, we catalog ways that past researchers fooled themselves about heterogeneity, and recommend ways that we can stop fooling ourselves about heterogeneity in the future. We make 18 specific recommendations and illustrate them with examples from education research. The most common themes are to (1) seek heterogeneity only when the mechanism offers clear motivation and the data offer adequate power, (2) shy away from seeking “no-but” heterogeneity when there is no main effect, (3) separate the noise of estimation error from the signal of true heterogeneity, (4) shrink variation in estimates toward zero, (5) increase p values and widen confidence intervals when conducting multiple tests, (6) estimate interactions rather than subgroup effects, and (7) check whether findings of heterogeneity are sensitive to changes in model or measurement. We also resolve longstanding debates about centering interactions in linear models and estimating interactions in nonlinear models such as logistic, ordinal, and interval regression. If researchers follow these recommendations, the search for heterogeneity should yield more trustworthy results in the future. AU - Hippel, Paul von AU - Schütze, Brendan A. PY - 2025 ST - How Not to Fool Ourselves About Heterogeneity of Treatment Effects TI - How Not to Fool Ourselves About Heterogeneity of Treatment Effects UR - http://www.edworkingpapers.com/ai25-1116 ER -