Search EdWorkingPapers by author, title, or keywords.
Methodology, measurement and data
Teacher rating scales (TRS) are often used to make service eligibility decisions for exceptional learners. Although TRS are regularly used to identify student exceptionalism either as part of an informal nomination process or through behavioral rating scales, there is little research documenting the between-teacher variance in teacher ratings or the consequences of such rater dependence. To evaluate the possible benefits or disadvantages of using TRS as part of a gifted identification process, we examined the student-, teacher-, and school-level variance in TRS controlling for student ability and achievement to determine the unique information, consistency, and potential bias in TRS. Between 10% and 25% of a students’ TRS score can be attributed to the teacher doing the rating, and between-teacher standard deviations represent an effect size of one-third to one-half standard deviation unit. Our results suggest that TRS are not easily comparable across teachers, making it impossible to set a cut score for admission into a program (or for further screening) that functions equitably across teachers.
Over the last twenty years, education researchers have increasingly conducted randomised experiments with the goal of informing the decisions of educators and policymakers. Such experiments have generally employed broad, consequential, standardised outcome measures in the hope that this would allow decisionmakers to compare effectiveness of different approaches. However, a combination of small effect sizes, wide confidence intervals, and treatment effect heterogeneity means that researchers have largely failed to achieve this goal. We argue that quasiexperimental methods and multi-site trials will often be superior for informing educators’ decisions on the grounds that they can achieve greater precision and better address heterogeneity. Experimental research remains valuable in applied education research. However, it should primarily be used to test theoretical models, which can in turn inform educators’ mental models, rather than attempting to directly inform decision making. Since comparable effect size estimates are not of interest when testing educational theory, researchers can and should improve the power of theory-informing experiments by using more closely aligned (i.e., valid) outcome measures. We argue that this approach would reduce wasteful research spending and make the research that does go ahead more statistically informative, thus improving the return on investment in educational research.
The Core Knowledge curriculum is a K-8 curriculum focused on building students General Knowledge about the world they live in that is hypothesized to increase reading comprehension and Reading/English-LA achievement. This study utilizes an experimental design to evaluate the long term effects of attending Charter schools teaching the Core Knowledge curriculum. Fourteen oversubscribed kindergarten lotteries for enrollment in nine Core Knowledge Charter schools using the curriculum had 2310 students applying from parents in predominately middle/high income school districts. State achievement data was collected at 3rd- 6th grade in Reading/English-LA and Mathematics and at 5th Grade in Science. A new methodology addresses two previously undiscovered sources of bias inherent in kindergarten lotteries that include middle/high income families. The unbiased confirmatory Reading-English-LA results show statistically significant ITT (0.241***) and TOT (0.473***) effects for 3rd-6th grade achievement with statistically significant ITT and TOT effects at each grade. Exploratory analyses also showed significant ITT (0.15*) and TOT (0.300*) unbiased effects at 5th grade in Science. A CK-Charter school in a low income school district also had statistically significant, moderate to large unbiased ITT and TOT effects in English Language Arts (ITT= 0.944**; TOT = 1.299**), Mathematics (ITT= 0.735*; TOT = 0.997*) and positive, but insignificant Science effects (ITT= 0.468; TOT = 0.622) that eliminated achievement gaps in all subjects.
We propose a new method for estimating school-level characteristics from publicly available census data. We use a school’s location to impute its catchment area by aggregating the nearest n census block groups such that the number of school-aged children in those n block groups is just over the number of students enrolled in that school. We then weight census data by the number of school-aged children in the block-group to estimate school-level measures. We conduct several robustness checks to assess the quality of our estimates and find that our method is broadly successful in replicating known school-level characteristics and producing unbiased estimates for school-level income. This method expands the available set of school-level variables to the broader and richer set of characteristics measured in the census, which can then be used to conduct descriptive and observational research across a long time horizon.
We propose a novel estimator for use in a fuzzy regression discontinuity setting. The estimator can be thought of as extrapolating the traditional fuzzy regression discontinuity estimate or as an observational study that adjusts for endogenous selection into treatment using information at the discontinuity. We show that it can be motivated as being the least complex model consistent with the data or as an estimator that is preferable to both a traditional regression discontinuity design and an observational study. We further show theoretically that no other estimators consistently generate better estimates than our proposed estimator. We then use this approach to examine the effects of early grade retention beyond the compliers around the retention cutoff. We show that the benefits of early grade retention policies are larger for students with lower baseline achievement and smaller for low-performing students who are exempt from retention. These findings imply that (1) the benefits of early grade retention policies are larger than have been estimated using traditional fuzzy regression discontinuity designs and (2) retaining additional students would have a limited effect on student outcomes.
Even though women have continuously caught up with men in education attainment and labor market participation since the 1970s, the wage gap between men and women still universally exists today. Do female college graduates still earn less than their male counterparts if men’s and women’s “profiles” of observed productivity-related characteristics are statistically adjusted to be equivalent? To answer this research question and better understand the current gender wage gap, I introduce a novel propensity score stratification method for gender wage gap decomposition. This new method overcomes certain limitations of the traditional Blinder-Oaxaca decomposition method, and provides an example of validly applying propensity score-based methods (mostly used in causal settings) to gender wage gap decomposition, a non-causal setting. Making use of this new method, I analyze a nationally representative sample from the Baccalaureate and Beyond Longitudinal Study, which represents the 1993 Cohort of U.S. college graduates. Through propensity score stratification, the observed productivity-related characteristics between men and women in the sample are statistically adjusted to be equivalent within each stratum of propensity score. After “equalizing” these characteristics, evidence shows the women-to-men wage ratio among this college educated population is still 87.4% at the tenth year after they graduated from college. This remaining gender gap cannot be explained by the observed gender differences in productivity-related characteristics, and is the evidence of a discriminatory wage gap possibly existing in the labor market. Additionally, the unexplained gender wage gap universally exists regardless whether these “profiles” of qualifications and labor market experience are stereotypically female or male. Even acknowledging that this research cannot account for all the gender differences in productivity due to data limitation, the results of this research will add to the empirical evidence of measuring the discriminatory wage gap that possibly exists in the labor market.
Panel or grouped data are often used to allow for unobserved individual heterogeneity in econometric models via fixed effects. In this paper, we discuss identification of a panel data model in which the unobserved heterogeneity both enters additively and interacts with treatment variables. We present identification and estimation methods for parameters of interest in this model under both strict and weak exogeneity assumptions. The key identification insight is that other periods' treatment variables are instruments for the unobserved fixed effects. We apply our proposed estimator to matched student-teacher data used to estimate value-added models of teacher quality. We show that the common assumption that the return to unobserved teacher quality is the same for all students is rejected by the data. We also present evidence that No Child Left Behind-era school accountability increased the effectiveness of teacher quality for lower performing students.
Heightened concerns about the health of the teaching profession highlight the importance of studying the early teacher pipeline. This exploratory, descriptive paper examines preservice teachers' (PST) expressed motivation for pursuing a teaching career and its relationship with PST characteristics and outcomes. Using data from one of the largest teacher education programs in Texas, we use a natural language processing algorithm to categorize into topical groups roughly 2,800 essay responses to the prompt, "Explain why you decided to become a teacher.'' We identify 11 topics that largely reflect altruistic and intrinsic (though not extrinsic) reasons for teaching. The frequency of motivation topics varied substantially by PST gender, race/ethnicity, and certification area. While topics collectively explained little of the variance in PST outcomes, we found preliminary evidence that intrinsic enjoyment of teaching and prior experiences with adversity predicted higher performance during clinical teaching and lower attrition as a full-time K–12 teacher.
Fadeout is a pervasive phenomenon: post-test impacts on cognitive skills commonly decrease in the years following an educational intervention. Less is known, although much is theorized, about social-emotional skill persistence. The current meta-analysis investigated whether educational RCT impacts on social-emotional skills demonstrated greater persistence than impacts on cognitive skills among 87 interventions involving 59,237 participants and 443 outcomes measured at post-test and at least one follow-up. For post-test impacts of the same magnitude, persistence rates were similar (43% of post-test magnitude) across skill types for follow-ups occurring 6 to 12 months after post-test. At 1- to 2-year follow-ups, persistence rates were larger for cognitive skills (37%) than for social-emotional skills. Interestingly, smaller posttest impacts persisted at proportionately higher rates than larger impacts, which may benefit interventions measuring social-emotional outcomes given their smaller post-test impacts. Considered in whole, social-emotional and cognitive skills demonstrated similar patterns of fadeout.