Search EdWorkingPapers

Search EdWorkingPapers by author, title, or keywords.

Methodology, measurement and data

Mengyuan Liang.

Even though women have continuously caught up with men in education attainment and labor market participation since the 1970s, the wage gap between men and women still universally exists today. Do female college graduates still earn less than their male counterparts if men’s and women’s “profiles” of observed productivity-related characteristics are statistically adjusted to be equivalent? To answer this research question and better understand the current gender wage gap, I introduce a novel propensity score stratification method for gender wage gap decomposition. This new method overcomes certain limitations of the traditional Blinder-Oaxaca decomposition method, and provides an example of validly applying propensity score-based methods (mostly used in causal settings) to gender wage gap decomposition, a non-causal setting. Making use of this new method, I analyze a nationally representative sample from the Baccalaureate and Beyond Longitudinal Study, which represents the 1993 Cohort of U.S. college graduates. Through propensity score stratification, the observed productivity-related characteristics between men and women in the sample are statistically adjusted to be equivalent within each stratum of propensity score. After “equalizing” these characteristics, evidence shows the women-to-men wage ratio among this college educated population is still 87.4% at the tenth year after they graduated from college. This remaining gender gap cannot be explained by the observed gender differences in productivity-related characteristics, and is the evidence of a discriminatory wage gap possibly existing in the labor market. Additionally, the unexplained gender wage gap universally exists regardless whether these “profiles” of qualifications and labor market experience are stereotypically female or male. Even acknowledging that this research cannot account for all the gender differences in productivity due to data limitation, the results of this research will add to the empirical evidence of measuring the discriminatory wage gap that possibly exists in the labor market.

More →


Jinyong Hahn, John D. Singleton, Nese Yildiz.

Panel or grouped data are often used to allow for unobserved individual heterogeneity in econometric models via fixed effects. In this paper, we discuss identification of a panel data model in which the unobserved heterogeneity both enters additively and interacts with treatment variables. We present identification and estimation methods for parameters of interest in this model under both strict and weak exogeneity assumptions. The key identification insight is that other periods' treatment variables are instruments for the unobserved fixed effects. We apply our proposed estimator to matched student-teacher data used to estimate value-added models of teacher quality. We show that the common assumption that the return to unobserved teacher quality is the same for all students is rejected by the data. We also present evidence that No Child Left Behind-era school accountability increased the effectiveness of teacher quality for lower performing students.

More →


Brendan Bartanen, Andrew Kwok, Andrew Avitabile, Brian Heseung Kim.

Heightened concerns about the health of the teaching profession highlight the importance of studying the early teacher pipeline. This exploratory, descriptive paper examines preservice teachers' (PST) expressed motivation for pursuing a teaching career and its relationship with PST characteristics and outcomes. Using data from one of the largest teacher education programs in Texas, we use a natural language processing algorithm to categorize into topical groups roughly 2,800 essay responses to the prompt, "Explain why you decided to become a teacher.'' We identify 11 topics that largely reflect altruistic and intrinsic (though not extrinsic) reasons for teaching. The frequency of motivation topics varied substantially by PST gender, race/ethnicity, and certification area. While topics collectively explained little of the variance in PST outcomes, we found preliminary evidence that intrinsic enjoyment of teaching and prior experiences with adversity predicted higher performance during clinical teaching and lower attrition as a full-time K–12 teacher.

More →


Emma R. Hart, Drew H. Bailey, Sha Luo, Pritha Sengupta, Tyler W. Watts.

Fadeout is a pervasive phenomenon: post-test impacts on cognitive skills commonly decrease in the years following an educational intervention. Less is known, although much is theorized, about social-emotional skill persistence. The current meta-analysis investigated whether educational RCT impacts on social-emotional skills demonstrated greater persistence than impacts on cognitive skills among 87 interventions involving 59,237 participants and 443 outcomes measured at post-test and at least one follow-up. For post-test impacts of the same magnitude, persistence rates were similar (43% of post-test magnitude) across skill types for follow-ups occurring 6 to 12 months after post-test. At 1- to 2-year follow-ups, persistence rates were larger for cognitive skills (37%) than for social-emotional skills. Interestingly, smaller posttest impacts persisted at proportionately higher rates than larger impacts, which may benefit interventions measuring social-emotional outcomes given their smaller post-test impacts. Considered in whole, social-emotional and cognitive skills demonstrated similar patterns of fadeout.

More →


Matthew Naven.

Low-socioeconomic status (SES), minority, and male students perform worse than their high-SES, non-minority, and female peers on standardized tests. This paper investigates how within-school differences in school quality contribute to these educational achievement gaps. Using individual-level data on the universe of public-school students in California, I estimate school quality using a value added methodology that accounts for the fact that students sort to schools on observable characteristics. I allow for within-school heterogeneity by estimating a distinct value added for each school's low-/high-SES, minority/non-minority, and male/female students. Standard value added models suggest that on average schools provide less value added to their low-SES, minority, and male students, particularly on postsecondary enrollment. However, value added models that control for neighborhood, older-sibling, and peer characteristics suggest that schools provide similar value added to low-/high-SES students and minority/non-minority students but more value added to female students. Within-school heterogeneity accounts for 6% of the test-score achievement gap and 22% of the difference in postsecondary enrollment between men and women.

More →


Meghan McCormick, Cullen MacDowell, Christina Weiland, JoAnn Hsueh, Michelle Maier, Mirjana Pralica, Samuel Maves, Catherine Snow, Jason Sachs.

This study uses implementation fidelity data from PreK to 1st grade in the Boston Public Schools (BPS) to measure instructional alignment and examine whether stronger alignment is associated with sustained benefits of BPS PreK on children’s language, literacy, and math skills through first grade. The study includes N = 498 students (mean age = 5.47, SD = 0.30 in K fall). Children who experienced strong instructional alignment across grades had faster gains in literacy (SD = .47) and math (SD = .28) skills through the spring of first grade compared with non-BPS PreK attenders. Mis-alignment predicted faster convergence in literacy skills. Results highlight that instructional alignment may help to sustain the initial benefits of PreK programs through first grade in a subset of outcome domains. Implications for further research measuring alignment in a broader range of settings and implications for practice are discussed.

More →


Wes Austin, Bengie Chen, Dan Goldhaber, Eric A. Hanushek, Kris Holden, Cory Koedel, Helen F. Ladd, Jin Luo, Eric Parsons, Gregory Phelan, Steven G. Rivkin, Tim Sass, Mavzuna Tureava.

Anecdotal evidence points to the importance of school principals, but the limited existing research has neither provided consistent results nor indicated any set of essential characteristics of effective principals. This paper exploits extensive student-level panel data across six states to investigate both variations in principal performance and the relationship between effectiveness and key certification factors. While principal effectiveness varies widely across states, there is little indication that regulation of the background and training of principals yields consistently effective performance. Having prior teaching or management experience is not related to our estimates of principal value-added.

More →


Edward J. Kim, Luke W. Miratrix.

Greater school choice leads to lower demand for private tutoring according to various international studies, but this has not been explicitly tested for the U.S. context. To estimate the causal effect of charter school appearances on neighboring private tutoring prevalence, we employ a comparative event study model combined with a longitudinal matching strategy to accommodate differing treatment years. In contrast to findings from other countries, we estimate that charter schools increase, rather than decrease, tutoring prevalence in the United States. We further find that the effect varies considerably based on the characteristics of the treated neighborhood: areas with the highest income, educational attainment, and proportion Asian show the greatest treatment impacts, while the areas with the least show null effects. Moreover, methodologically this investigation offers a pipeline for flexibly estimating causal effects with observational, longitudinal, geographically located data.

More →


Ishita Ahmed, Masha Bertling, Lijin Zhang, Andrew D. Ho, Prashant Loyalka, Hao Xue, Scott Rozelle, Benjamin W. Domingue.

Researchers use test outcomes to evaluate the effectiveness of education interventions across numerous randomized controlled trials (RCTs). Aggregate test data—for example, simple measures like the sum of correct responses—are compared across treatment and control groups to determine whether an intervention has had a positive impact on student achievement. We show that item-level data and psychometric analyses can provide information about treatment heterogeneity and improve design of future experiments. We apply techniques typically used in the study of Differential Item Functioning (DIF) to examine variation in the degree to which items show treatment effects. That is, are observed treatment effects due to generalized gains on the aggregate achievement measures or are they due to targeted gains on specific items? Based on our analysis of 7,244,566 item responses (265,732 students responding to 2,119 items) taken from 15 RCTs in low-and-middle-income countries, we find clear evidence for variation in gains across items. DIF analyses identify items that are highly sensitive to the interventions—in one extreme case, a single item drives nearly 40% of the observed treatment effect—as well as items that are insensitive. We also show that the variation of item-level sensitivity can have implications for the precision of effect estimates. Of the RCTs that have significant effect estimates, 41% have patterns of item-level sensitivity to treatment that allow for the possibility of a null effect when this source of uncertainty is considered. Our findings demonstrate how researchers can gain more insight regarding the effects of interventions via additional analysis of item-level test data.

More →


Anjali Adukia, Alex Eble, Emileigh Harrison, Hakizumwami Birali Runesha, Teodora Szasz.

Books shape how children learn about society and norms, in part through representation of different characters. We introduce new artificial intelligence methods for systematically converting images into data and apply them, along with text analysis methods, to measure the representation of skin color, race, gender, and age in award-winning children’s books widely read in homes, classrooms, and libraries over the last century. We find that more characters with darker skin color appear over time, but the most influential books persistently depict characters with lighter skin color, on average, than other books, even after conditioning on race; we also find that children are depicted with lighter skin than adults on average. Relative to their growing share of the U.S. population, Black and Latinx people are underrepresented in these same books, while White males are overrepresented. Over time, females are increasingly present but appear less often in text than in images, suggesting greater symbolic inclusion in pictures than substantive inclusion in stories. We then present analysis of the supply of, and demand for, books with different levels of representation to better understand the economic behavior that may contribute to these patterns. On the demand side, we show that people consume books that center their own identities. On the supply side, we document higher prices for books that center non-dominant social identities and fewer copies of these books in libraries that serve predominantly White communities. Lastly, we show that the types of children's books purchased in a neighborhood are related to local political beliefs.

More →