Assessment
Multiply by 37 (or Divide by 0.023): A Surprisingly Accurate Rule of Thumb for Converting Effect Sizes from Standard Deviations to Percentile Points
Topics: MethodsTags: AssessmentEducational researchers often report effect sizes in standard deviation units (SD), but SD effects are hard to interpret. Effects are easier to interpret in percentile points, but converting SDs to percentile points involves a calculation that is not transparent to educational stakeholders. We… more →
How Much Teacher Is in Teacher Rating Scales?
Tags: Gifted education, AssessmentTeacher rating scales (TRS) are often used to make service eligibility decisions for exceptional learners. Although TRS are regularly used to identify student exceptionalism either as part of an informal nomination process or through behavioral rating scales, there is little research documenting… more →
A Global Regression Discontinuity Design: Theory and Application to Grade Retention Policies
Topics: MethodsWe use a marginal treatment effect (MTE) representation of a fuzzy regression discontinuity setting to propose a novel estimator. The estimator can be thought of as extrapolating the traditional fuzzy regression discontinuity estimate or as an observational study that adjusts for endogenous… more →
Identification of Non-Additive Fixed Effects Models: Is the Return to Teacher Quality Homogeneous?
Topics: MethodsPanel or grouped data are often used to allow for unobserved individual heterogeneity in econometric models via fixed effects. In this paper, we discuss identification of a panel data model in which the unobserved heterogeneity both enters additively and interacts with treatment variables. We… more →
Within-School Heterogeneity in Quality: Do Schools Provide Equal Value Added to All Students?
Topics: Families and CommunitiesLow-socioeconomic status (SES), minority, and male students perform worse than their high-SES, non-minority, and female peers on standardized tests. This paper investigates how within-school differences in school quality contribute to these educational achievement gaps. Using individual-level… more →
How Measurement Affects Causal Inference: Attenuation Bias is (Usually) More Important Than Scoring Weights
Topics: MethodsTags: AssessmentWhen analyzing treatment effects on test scores, researchers face many choices and competing guidance for scoring tests and modeling results. This study examines the impact of scoring choices through simulation and an empirical application. Results show that estimates from multiple methods applied… more →
Heterogeneity of item-treatment interactions masks complexity and generalizability in randomized controlled trials
Ishita Ahmed, Masha Bertling, Lijin Zhang, Andrew D. Ho, Prashant Loyalka, Hao Xue, Scott Rozelle, Benjamin W. Domingue.Topics: MethodsResearchers use test outcomes to evaluate the effectiveness of education interventions across numerous randomized controlled trials (RCTs). Aggregate test data—for example, simple measures like the sum of correct responses—are compared across treatment and control groups to determine whether an… more →
Out of Sight, Out of Mind? The Gap between Students’ Test Performance and Teachers’ Estimations in India and Bangladesh
This is one of the first studies of the mismatch between students’ test scores and teachers’ estimations of those scores in low- and middle-income countries. Prior studies in high-income countries have found strong correlations between these metrics. We leverage data on actual and estimated… more →
The Impact of Armed Conflict on College Students
Given the spike of homicides in conflict zones of Colombia after the 2016 peace agreement, I study the causal effect of violence on college test scores. Using a difference-in-difference design with heterogeneous effects, I show how this increase in violence had a negative effect on college… more →
Measuring grading standards at high schools: a methodology and an example
At schools with low grading standards, students receive higher school-awarded grades across multiple courses than students with the same skills receive at schools with high grading standards. A new methodology shows grading standards vary substantially, certainly enough to affect post-secondary… more →
What We Can Learn About Latin American Educational Systems from International Tests: A Brief Foray
Topics: Student LearningThe Revista del Centro de Estudios Educativos, numero 3, 1971 included an early Carnoy article on the economics of education: “Un enfoque de sistemas para evaluar la educación, ilustrado con datos de Puerto Rico.” The article used a unique data set that had student test scores, students’ family… more →
Measuring returns to experience using supervisor ratings of observed performance: The case of classroom teachers
Topics: MethodsTags: Human capital, AssessmentWe study the returns to experience in teaching, estimated using supervisor ratings from classroom observations. We describe the assumptions required to interpret changes in observation ratings over time as the causal effect of experience on performance. We compare two difference-in-differences… more →
Employee evaluation and skill investments: Evidence from public school teachers
When employees expect evaluation and performance incentives will continue (or begin) in the future, the potential future rewards create an incentive to invest in relevant skills today. Because skills benefit job performance, the effects of evaluation can persist after the rewards end or even… more →
Estimating Treatment Effects with the Explanatory Item Response Model
Topics: MethodsThis simulation study examines the characteristics of the Explanatory Item Response Model (EIRM) when estimating treatment effects when compared to classical test theory (CTT) sum and mean scores and item response theory (IRT)-based theta scores. Results show that the EIRM and IRT theta scores… more →
On the Threshold: Impacts of Barely Passing High-School Exit Exams on Post-Secondary Enrollment and Completion
Many states use high-school exit examinations to assess students’ career and college readiness in core subjects. We find meaningful consequences of barely passing the mathematics examination in Massachusetts, as opposed to just failing it. However, these impacts operate at different educational… more →
The Other Half of the Story: Does Excluding the Early Grades from School Ratings Matter?
Because high-stakes testing for school accountability does not begin until third grade, accountability ratings for elementary schools do not directly measure students’ academic progress in grades K through 2. While it is possible that children’s test scores in grades 3 and above are highly… more →
Modeling Item-Level Heterogeneous Treatment Effects with the Explanatory Item Response Model: Leveraging Online Formative Assessments to Pinpoint the Impact of Educational Interventions
Topics: MethodsTags: AssessmentAnalyses that reveal how treatment effects vary allow researchers, practitioners, and policymakers to better understand the efficacy of educational interventions. In practice, however, standard statistical methods for addressing Heterogeneous Treatment Effects (HTE) fail to address the HTE that… more →
Promises, Pitfalls, and Tradeoffs in Identifying Gifted Learners: Evidence from a Curricular Experiment
Disparities in gifted representation across demographic subgroups represents a large and persistent challenge in U.S. public schools. In this paper, we measure the impacts of a school-wide curricular intervention designed to address such disparities. We implemented Nurturing for a Bright… more →
Does Teacher Professional Development Improve Student Learning? Evidence from Leading Educators’ Fellowship Model
Topics: Teacher and Leader DevelopmentTeachers are the most important school-specific factor in student learning. Yet, little evidence exists linking teacher professional development programs and the strategies or activities that comprise them to student achievement. In this paper, we examine a fellowship model for professional… more →
Assessors influence results: Evidence on enumerator effects and educational impact evaluations
Topics: MethodsA significant share of education and development research uses data collected by workers called “enumerators.” It is well-documented that “enumerator effects”—or inconsistent practices between the individual people who administer measurement tools— can be a key source of error in survey data… more →