Search for EdWorkingPapers here by author, title, or keywords.
Methodology, measurement and data
A common rationale for offering online courses in K-12 schools is that they allow students to take courses not offered at their schools; however, there has been little research on how online courses are used to expand curricular options when operating at scale. We assess the extent to which students and schools use online courses for this purpose by analyzing statewide, student-course level data from high school students in Florida, which has the largest virtual sector in the nation. We introduce a “novel course” framework to address this question. We define a virtual course as “novel” if it is only available to a student virtually, not face-to-face through their own home high school. We find that 7% of high school students in 2013-14 enroll in novel online courses. Novel courses were more commonly used by higher-achieving students, in rural schools, and in schools with relatively few Advanced Placement/International Baccalaureate offerings.
Enrollment in higher education has risen dramatically in Latin America, especially in Chile. Yet graduation and persistence rates remain low. One way to improve graduation and persistence is to use data and analytics to identify students at risk of dropout, target interventions, and evaluate interventions’ effectiveness at improving student success. We illustrate the potential of this approach using data from eight Chilean universities. Results show that data available at matriculation are only weakly predictive of persistence, while prediction improves dramatically once data on university grades become available. Some predictors of persistence are under policy control. Financial aid predicts higher persistence, and being denied a first-choice major predicts lower persistence. Student success programs are ineffective at some universities; they are more effective at others, but when effective they often fail to target the highest risk students. Universities should use data regularly and systematically to identify high-risk students, target them with interventions, and evaluate those interventions’ effectiveness.
Clustered observational studies (COSs) are a critical analytic tool for educational effectiveness research. We present a design framework for the development and critique of COSs. The framework is built on the counterfactual model for causal inference and promotes the concept of designing COSs that emulate the targeted randomized trial that would have been conducted were it feasible. We emphasize the key role of understanding the assignment mechanism to study design. We review methods for statistical adjustment and highlight a recently developed form of matching designed specifically for COSs. We review how regression models can be profitably combined with matching and note best practice for estimates of statistical uncertainty. Finally, we review how sensitivity analyses can determine whether conclusions are sensitive to bias from potential unobserved confounders. We demonstrate concepts with an evaluation of a summer school reading intervention in Wake County, North Carolina.
Many interventions in education occur in settings where treatments are applied to groups. For example, a reading intervention may be implemented for all students in some schools and withheld from students in other schools. When such treatments are non-randomly allocated, outcomes across the treated and control groups may differ due to the treatment or due to baseline differences between groups. When this is the case, researchers can use statistical adjustment to make treated and control groups similar in terms of observed characteristics. Recent work in statistics has developed matching methods designed for contexts where treatments are clustered. This form of matching, known as multilevel matching, may be well suited to many education applications where treatments are assigned to schools. In this article, we provide an extensive evaluation of multilevel matching and compare it to multilevel regression modeling. We evaluate multilevel matching methods in two ways. First, we use these matching methods to recover treatment effect estimates from three clustered randomized trials using a within-study comparison design. Second, we conduct a simulation study. We find evidence that generally favors an analytic approach to statistical adjustment that combines multilevel matching with regression adjustment. We conclude with an empirical application.
Researchers commonly interpret effect sizes by applying benchmarks proposed by Cohen over a half century ago. However, effects that are small by Cohen’s standards are large relative to the impacts of most field-based interventions. These benchmarks also fail to consider important differences in study features, program costs, and scalability. In this paper, I present five broad guidelines for interpreting effect sizes that are applicable across the social sciences. I then propose a more structured schema with new empirical benchmarks for interpreting a specific class of studies: causal research on education interventions with standardized achievement outcomes. Together, these tools provide a practical approach for incorporating study features, cost, and scalability into the process of interpreting the policy importance of effect sizes.
Using rich longitudinal data from one of the largest teacher education programs in Texas, we examine the measurement of pre-service teacher (PST) quality and its relationship with entry into the K–12 public school teacher workforce. Drawing on rubric-based observations of PSTs during clinical teaching, we find that little of the variation in observation scores is attributable to actual differences between PSTs. Instead, differences in scores largely reflect differences in the rating standards of field supervisors. We also find that men and PSTs of color receive systematically lower scores. Finally, higher-scoring PSTs are slightly more likely to enter the teacher workforce and substantially more likely to be hired at the same school as their clinical teaching placement.
Important educational policy decisions, like whether to shorten or extend the school year, often require accurate estimates of how much students learn during the year. Yet, related research relies on a mostly untested assumption: that growth in achievement is linear throughout the entire school year. We examine this assumption using a data set containing math and reading test scores for over seven million students in kindergarten through 8th grade across the fall, winter, and spring of the 2016-17 school year. Our results indicate that assuming linear within-year growth is often not justified, particularly in reading. Implications for investments in extending the school year, summer learning loss, and racial/ethnic achievement gaps are discussed.
High school graduation rates have increased dramatically in the past two decades. Some skepticism has arisen, however, because of the confluence of the graduation rise and the starts of high-stakes accountability for graduation rates with No Child Left Behind (NCLB). In this study we provide some of the first evidence about the role of accountability versus strategic behavior, especially the degree to which the recent graduation rate rise represents increased human capital. First, using national DD analysis of within-state, cross-district variation in proximity to state graduation rate thresholds, we confirm that NCLB accountability increased graduation rates. However, we find limited evidence that this is due to strategic behavior. To test for lowering of graduation standards, we examined graduation rates in states with and without graduation exams and trends in GEDs; neither analysis suggests that the graduation rate rise is due to strategic behavior. We also examined the effects of “credit recovery” courses using Louisiana micro data; while our results suggest an increase in credit recovery, consistent with some lowering of standards, the size of the effect is not nearly enough to explain the rise in graduation rates. Finally, we examine other forms of strategic behavior by schools, though these can only explain inflation of school/district-level graduation rates, not rational rates. Overall, the evidence suggests that the rise in the national graduation rates reflects some strategic behavior, but also a substantial increase in the nation’s stock of human capital. Graduation accountability was a key contributor.
This paper uses meta-analytic techniques to estimate the separate effects of the starting age, program duration, and persistence of impacts of early childhood education programs on children’s cognitive and achievement outcomes. It concentrates on studies published before the wide scale penetration of state-pre-K programs. Specifically, data are drawn from 67 high-quality evaluation studies conducted between 1960 and 2007, which provide 993 effect sizes for analyses. When weighted for differential precision, effect sizes averaged .26 sd at the end of these programs. We find larger effect sizes for programs starting in infancy/toddlerhood than in the preschool years and, surprisingly, smaller average effect sizes at the end of longer as opposed to shorter programs. Our findings suggest that, on average, impacts decline geometrically following program completion, losing nearly half of their size within one year after the end of treatment. Taken together, these findings reflect a moderate level of effectiveness across a wide range of center-based programs and underscore the need for innovative intervention strategies to produce larger and more persistent impacts.
A huge portion of what we know about how humans develop, learn, behave, and interact is based on survey data. Researchers use longitudinal growth modeling to understand the development of students on psychological and social-emotional learning constructs across elementary and middle school. In these designs, students are typically administered a consistent set of self-report survey items across multiple school years, and growth is measured either based on sum scores or scale scores produced based on item response theory (IRT) methods. While there is great deal of guidance on scaling and linking IRT-based large-scale educational assessment to facilitate the estimation of examinee growth, little of this expertise is brought to bear in the scaling of psychological and social-emotional constructs. Through a series of simulation and empirical studies, we produce scores in a single-cohort repeated measure design using sum scores as well as multiple IRT approaches and compare the recovery of growth estimates from longitudinal growth models using each set of scores. Results indicate that using scores from multidimensional IRT approaches that account for latent variable covariances over time in growth models leads to better recovery of growth parameters relative to models using sum scores and other IRT approaches.