James Soland

Institution: University of Virginia

EdWorkingPapers

A huge portion of what we know about how humans develop, learn, behave, and interact is based on survey data. Researchers use longitudinal growth modeling to understand the development of students on psychological and social-emotional learning constructs across elementary and middle school. In these designs, students are typically administered a consistent set of self-report survey items across multiple school years, and growth is measured either based on sum scores or scale scores produced based on item response theory (IRT) methods. While there is great deal of guidance on scaling and linking IRT-based large-scale educational assessment to facilitate the estimation of examinee growth, little of this expertise is brought to bear in the scaling of psychological and social-emotional constructs.  Through a series of simulation and empirical studies, we produce scores in a single-cohort repeated measure design using sum scores as well as multiple IRT approaches and compare the recovery of growth estimates from longitudinal growth models using each set of scores. Results indicate that using scores from multidimensional IRT approaches that account for latent variable covariances over time in growth models leads to better recovery of growth parameters relative to models using sum scores and other IRT approaches.

More →


Survey respondents use different response styles when they use the categories of the Likert scale differently despite having the same true score on the construct of interest.  For example, respondents may be more likely to use the extremes of the response scale independent of their true score.  Research already shows that differing response styles can create a construct-irrelevant source of bias that distorts fundamental inferences made based on survey data.  While some initial studies examine the effect of response styles on survey scores in longitudinal analyses, the issue of how response styles affect estimates of growth is underexamined.  In this study, we conducted empirical and simulation analyses in which we scored surveys using item response theory (IRT) models that do and do not account for response styles, and then used those different scores in growth models and compared results.  Generally, we found that response styles can affect estimates of growth parameters including the slope, but that the effects vary by psychological construct, response style, and model used.

More →


Students’ level of academic skills at school entry are a strong predictor of later academic success, and focusing on improving these skills during the preschool years has been a priority during the past ten years. Evidence from two prior nationally representative studies indicated that incoming kindergarteners’ math and literacy skills were higher in 2010 than 1998, but no national studies have examined trends since 2010. This study examines academic skills at kindergarten entry from 2010 and 2017 using data from over 2 million kindergarten students. Results indicated kindergarteners in 2017 have slightly lower math and reading skills than in 2010, but that inequalities at school entry by race/ethnicity and school poverty level have decreased during this period.

More →


James Soland.

Research has begun to investigate whether teachers and schools are as effective with certain student subgroups as they are with the overall student population.  Most of this research has examined the issue by trying to produce causal estimates of school contributions to short-term student growth (usually using value-added models) and has emphasized rank orderings of schools by subgroup.  However, not much is known about whether schools contributing to long-term growth for all students are also contributing to student growth by subgroup in ways that might close achievement gaps.  In this study, schools’ contributions to student growth are estimated separately for Black versus White students.  Results show that focusing on rank orderings of schools alone can mask troubling trends in relative achievement over time.  Options for how policymakers can sensibly hold schools accountable for student growth, including under The Every Student Succeeds Act, are discussed.

More →


Effect sizes in the Cohen’s d family are often used in education to compare estimates across studies, measures, and sample sizes.  For example, effect sizes are used to compare gains in achievement students make over time, either in pre- and post-treatment studies or in the absence of intervention, such as when estimating achievement gaps.  However, despite extensive research dating back to the paired t-test literature showing that such growth effect sizes should account for within-person correlations of scores over time, such achievement gains are often standardized relative to the standard deviation from a single timepoint or two timepoints pooled.  Such a tendency likely occurs in part because there are not many large datasets from which a distribution of student- or school-level gains can be derived.  In this study, we present a novel model for estimating student growth in conjunction with a national dataset to show that effect size estimates for student and school growth are often quite different when standardized relative to a distribution of gains rather than static achievement.  In particular, we provide nationally representative empirical benchmarks for student achievement and gains, including for male-female gaps in those gains, and examine the sensitivity of those effect sizes to how they are standardized.  Our results suggest that effect sizes scaled relative to a distribution of gains are less likely to understate the effects of interventions over time, and that resultant effect sizes often more closely match the estimand of interest for most practice, policy, and evaluation questions.

More →