Search for EdWorkingPapers here by author, title, or keywords.
Methodology, measurement and data
Researchers are rarely satisfied to learn only whether an intervention works, they also want to understand why and under what circumstances interventions produce their intended effects. These questions have led to increasing calls for implementation research to be included in high quality studies with strong causal claims. Of critical importance is determining whether an intervention can be delivered with adherence to a standardized protocol, and the extent to which an intervention protocol can be replicated across sessions, sites, and studies. When an intervention protocol is highly standardized and delivered through verbal interactions with participants, a set of natural language processing (NLP) techniques termed semantic similarity can be used to provide quantitative summary measures of how closely intervention sessions adhere to a standardized protocol, as well as how consistently the protocol is replicated across sessions. Given the intense methodological, budgetary and logistical challenges for conducting implementation research, semantic similarity approaches have the benefit of being low-cost, scalable, and context agnostic for use. In this paper, we demonstrate how semantic similarity approaches may be utilized in an experimental evaluation of a coaching protocol on teacher pedagogical skills in a simulated classroom environment. We discuss strengths and limitations of the approach, and the most appropriate contexts for applying this method.
Recent interest to promote and support replication efforts assume that there is well-established methodological guidance for designing and implementing these studies. However, no such consensus exists in the methodology literature. This article addresses these challenges by describing design-based approaches for planning systematic replication studies. Our general approach is derived from the Causal Replication Framework (CRF), which formalizes the assumptions under which replication success can be expected. The assumptions may be understood broadly as replication design requirements and individual study design requirements. Replication failure occurs when one or more CRF assumptions are violated. In design-based approaches to replication, CRF assumptions are systematically tested to evaluate the replicability of effects, as well as to identify sources of effect variation when replication failure is observed. In direct replication designs, replication failure is evidence of bias or incorrect reporting in individual study estimates, while in conceptual replication designs, replication failure occurs because of effect variation due to differences in treatments, outcomes, settings, and participant characteristics. The paper demonstrates how multiple research designs may be combined in systematic replication studies, as well as how diagnostic measures may be used to assess the extent to which CRF assumptions are met in field settings.
The Community Eligibility Provision (CEP) is a policy change to the federally-administered National School Lunch Program that allows schools serving low-income populations to classify all students as eligible for free meals, regardless of individual circumstances. This has implications for the use of free and reduced-price meal (FRM) data to proxy for student disadvantage in education research and policy applications, which is a common practice. We document empirically how the CEP has affected the value of FRM eligibility as a proxy for student disadvantage. At the individual student level, we show that there is essentially no effect of the CEP. However, the CEP does meaningfully change the information conveyed by the share of FRM-eligible students in a school. It is this latter measure that is most relevant for policy uses of FRM data.
Note: Portions of this paper were previously circulated under the title “Using Free Meal and Direct Certification Data to Proxy for Student Disadvantage in the Era of the Community Eligibility Provision.” We have since split the original paper into two parts. This is the first part.
International assessments are important to benchmark the quality of education across countries. However, on low-stakes tests, students’ incentives to invest their maximum effort may be minimal. Research stresses that ignoring students’ effort when interpreting results from low-stakes assessments can lead to biased interpretations of test performance across groups of examinees. We use data from the Programme for International Student Assessment (PISA), a low-stakes test, to analyze the extent to which student effort helps to explain test scores heterogeneity across countries and by gender groups. Our results highlight the importance of accounting for differences in student effort to understand cross-country heterogeneity in performance and variations in gender achievement gaps across nations. We find that, once we account for differential student effort across gender groups, the estimated gender achievement gap in math and science could be up to 12 and 6 times wider, respectively, and up to 49 percent narrower in reading, in favor of boys. In math and science, the gap widens in most countries, even among some of the top 20 most gender-equal countries. Altogether, our effort measures on average explain between 36 and 40 percent of the cross-country variation in test scores.
The worldwide school closures in early 2020 led to losses in learning that will not easily be made up for even if schools quickly return to their prior performance levels. These losses will have lasting economic impacts both on the affected students and on each nation unless they are effectively remediated.
While the precise learning losses are not yet known, existing research suggests that the students in grades 1-12 affected by the closures might expect some 3 percent lower income over their entire lifetimes. For nations, the lower long-term growth related to such losses might yield an average of 1.5 percent lower annual GDP for the remainder of the century. These economic losses would grow if schools are unable to re-start quickly.
The economic losses will be more deeply felt by disadvantaged students. All indications are that students whose families are less able to support out-of-school learning will face larger learning losses than their more advantaged peers, which in turn will translate into deeper losses of lifetime earnings.
The present value of the economic losses to nations reach huge proportions. Just returning schools to where they were in 2019 will not avoid such losses. Only making them better can. While a variety of approaches might be attempted, existing research indicates that close attention to the modified re-opening of schools offers strategies that could ameliorate the losses. Specifically, with the expected increase in video-based instruction, matching the skills of the teaching force to the new range of tasks and activities could quickly move schools to heightened performance. Additionally, because the prior disruptions are likely to increase the variations in learning levels within individual classrooms, pivoting to more individualised instruction could leave all students better off as schools resume.
As schools move to re-establish their programmes even as the pandemic continues, it is natural to focus considerable attention on the mechanics and logistics of safe re-opening. But the long-term economic impacts also require serious attention, because the losses already suffered demand more than the best of currently considered re-opening approaches.
State testing programs regularly release previously administered test items to the public. We provide an open-source recipe for state, district, and school assessment coordinators to combine these items flexibly to produce scores linked to established state score scales. These would enable estimation of student score distributions and achievement levels. We discuss how educators can use resulting scores to estimate achievement distributions at the classroom and school level. We emphasize that any use of such tests should be tertiary, with no stakes for students, educators, and schools, particularly in the context of a crisis like the COVID-19 pandemic. These tests and their results should also be lower in priority than assessments of physical, mental, and social–emotional health, and lower in priority than classroom and district assessments that may already be in place. We encourage state testing programs to release all the ingredients for this recipe to support low-stakes, aggregate-level assessments. This is particularly urgent during a crisis where scores may be declining and gaps increasing at unknown rates.
Evidence on educational returns and the factors that determine the demand for schooling in developing countries is extremely scarce. We use two surveys from Tanzania to estimate both the actual and perceived schooling returns and subsequently examine what factors drive individual misperceptions regarding actual returns. Using ordinary least squares and instrumental variable methods, we find that each additional year of schooling in Tanzania increases earnings, on average, by 9 to 11 percent. We find that on average, individuals underestimate returns to schooling by 74 to 79 percent, and three factors are associated with these misperceptions: income, asset poverty, and educational attainment. Shedding light on what factors relate to individual beliefs about educational returns can inform policy on how to structure effective interventions to correct individuals' misperceptions.
Numerous studies have considered the important role of cognition in estimating the returns to schooling. How cognitive abilities affect schooling may have important policy implications, especially in developing countries during periods of increasing educational attainment. Using two longitudinal labor surveys that collect direct proxy measures of cognitive skills, we study the importance of specific cognitive domains for the returns to schooling in two samples. We instrument for schooling levels and we find that each additional year of schooling leads to an increase in earnings by approximately 18-20 percent. The estimated effect sizes—based on the two-stage least squares estimates—are above the corresponding ordinary least squares estimates. Furthermore, we estimate and demonstrate the importance of specific cognitive domains in the classical Mincer equation. We find that executive functioning skills (i.e., memory and orientation) are important drivers of earnings in the rural sample, whereas higher-order cognitive skills (i.e., numeracy) are more important for determining earnings in the urban sample. Although numeracy is tested in both samples, it is only a statistically significant predictor of earnings in the urban sample. (JEL I21, F63, F66, N37)
A common rationale for offering online courses in K-12 schools is that they allow students to take courses not offered at their schools; however, there has been little research on how online courses are used to expand curricular options when operating at scale. We assess the extent to which students and schools use online courses for this purpose by analyzing statewide, student-course level data from high school students in Florida, which has the largest virtual sector in the nation. We introduce a “novel course” framework to address this question. We define a virtual course as “novel” if it is only available to a student virtually, not face-to-face through their own home high school. We find that 7% of high school students in 2013-14 enroll in novel online courses. Novel courses were more commonly used by higher-achieving students, in rural schools, and in schools with relatively few Advanced Placement/International Baccalaureate offerings.
Researchers decompose test score “gaps” and gap-changes into within- and between-school portions to generate evidence on the role that schools play in shaping educational inequality. However, existing decomposition methods (a) assume an equal-interval test scale and (b) are a poor fit to coarsened data such as proficiency categories. We develop two decomposition approaches that overcome these limitations: an extension of V, an ordinal gap statistic (Ho, 2009), and an extension of ordered probit models (Reardon et al., 2017). Simulations show V decompositions have negligible bias with small within-school samples. Ordered probit decompositions have negligible bias with large within-school samples but more serious bias with small within-school samples. These methods are applicable to decomposing any ordinal outcome by any categorical grouping variable.