Search EdWorkingPapers

Search EdWorkingPapers by author, title, or keywords.

Methodology, measurement and data

Lelys Dinarte-Diaz, Maria Marta Ferreyra, Sergio Urzua, Marina Bassi.

Short-cycle higher education programs (SCPs) can play a central role in skill development and higher education expansion, yet their quality varies greatly within and among countries. In this paper we explore the relationship between programs’ practices and inputs (quality determinants) and student academic and labor market outcomes. We design and conduct a novel survey to collect program-level information on quality determinants and average outcomes for Brazil, Colombia, Dominican Republic, Ecuador, and Peru. Categories of quality determinants include training and curriculum, infrastructure, faculty, link with productive sector, costs and funding, and practices on student admission and institutional governance. We also collect administrative, student-level data on higher education and formal employment for SCP students in Brazil and Ecuador and match it to survey data. Using machine learning methods, we select the quality determinants that predict outcomes at the program and student levels. Estimates indicate that some quality determinants may favor academic and labor market outcomes while others may hinder them. Two practices predict improvements in all labor market outcomes in Brazil and Ecuador—teaching numerical competencies and providing job market information—and one practice—teaching numerical competencies—additionally predicts improvements in labor market outcomes for all survey countries. Since quality determinants account for 20-40 percent of the explained variation in student-level outcomes, quality determinants might have a role shrinking program quality gaps. Findings have implications for the design and replication of high-quality SCPs, their regulation, and the development of information systems.

More →

Michael S. Hayes, Jing Liu, Seth Gershenson.

Teachers affect a wide range of students’ educational and social outcomes, but how they contribute to students’ involvement in school discipline is less understood. We estimate the impact of teacher demographics and other observed qualifications on students’ likelihood of receiving a disciplinary referral. Using data that track all disciplinary referrals and the identity of both the referred and referring individuals from a large and diverse urban school district in California, we find students are about 0.2 to 0.5 percentage points (7% to 18%) less likely to receive a disciplinary referral from teachers of the same race or gender than from teachers of different demographic backgrounds. Students are also less likely to be referred by more experienced teachers and by teachers who hold either an English language learners or special education credential. These results are mostly driven by referrals for defiance and violence infractions, Black and Hispanic male students, and middle school students. While it is unclear whether these findings are due to variation in teachers’ effects on actual student behavior, variation in teachers’ proclivities to make disciplinary referrals, or a combination of the two, these results nonetheless suggest that teachers play a central role in the prevalence of, and inequities in, office referrals and subsequent student discipline.

More →

Danielle Victoria Handel, Eric A. Hanushek.

The impact of school resources on student outcomes was first raised in the 1960s and has been controversial since then. This issue enters into the decision making on school finance in both legislatures and the courts. The historical research found little consistent or systematic relationship of spending and achievement, but this research frequently suffers from significant concerns about the underlying estimation strategies. More recent work has re-opened the fundamental resource-achievement relationship with more compelling analyses that offer stronger identification of resource impacts. A thorough review of existing studies, however, leads to similar conclusions as the historical work: how resources are used is key to the outcomes. At the same time, the research has not been successful at identifying mechanisms underlying successful use of resources or for ascertaining when added school investments are likely to be well-used. Direct investigations of alternative input policies (capital spending, reducing class size, or salary incentives for teachers) do not provide clear support for such specific policy initiatives.

More →

Michael Dinerstein, Isaac M. Opper.

What happens when employers screen their employees but only observe a subset of output? We specify a model with heterogeneous employees and show that their response to the screening affects output in both the probationary period and the post-probationary period. The post-probationary impact is due to their heterogeneous responses affecting which individuals are retained and hence the screening efficiency. We show that the impact of the endogenous response on both the unobserved outcome and screening efficiency depends on whether increased effort on one task increases or decreases the marginal cost of effort on the other task. If the response decreases unobserved output in the probationary period then it increases the screening efficiency, and vice versa. We then assess these predictions empirically by studying a change to teacher tenure policy in New York City, which increased the role that a single measure -- test score value-added -- played in tenure decisions. We show that in response to the policy teachers increased test score value-added and decreased output that did not enter the tenure decision. The increase in test score value-added was largest for the teachers with more ability to improve students' untargeted outcomes, increasing their likelihood of getting tenure. We estimate that the endogenous response to the policy announcement reduced the screening efficiency gap -- defined as the reduction of screening efficiency stemming from the partial observability of output -- by 28%, effectively shifting some of the cost of partial observability from the post-tenure period to the pre-tenure period.

More →

Suchitra Akmanchi, Kelli A. Bird, Benjamin L. Castleman.

Prediction algorithms are used across public policy domains to aid in the identification of at-risk individuals and guide service provision or resource allocation. While growing research has investigated concerns of algorithmic bias, much less research has compared algorithmically-driven targeting to the counterfactual: human prediction. We compare algorithmic and human predictions in the context of a national college advising program, focusing in particular on predicting high-achieving, lower-income students’ college enrollment quality. College advisors slightly outperform a prediction algorithm; however, greater advisor accuracy is concentrated among students with whom advisors had more interactions. The algorithm achieved similar accuracy among students lower in the distribution of interactions, despite advisors having substantially more information. We find no evidence that the advisors or algorithm exhibit bias against vulnerable populations. Our results suggest that, especially at scale, algorithms have the potential to provide efficient, accurate, and unbiased predictions to target scarce social services and resources.

More →

Xin Li, Liping Ma, Xiaoyang Ye, Qiong Zhu.

This paper provides one of the first natural experimental evidence on the consequences of a transition from college-major (early specialization) to college-then-major (late specialization) choice mechanism. Specifically, we study a recent reform in China that allows college applicants to apply to a meta-major consisting of different majors and to declare a specialization late in college instead of applying to a specific major. Using administrative data over 18 years on the universe of college applicants in a Chinese province, we examine the impacts of the staggered adoption of the reform across institutions on student composition changes. We find substantial heterogeneous effects across institutions and majors despite the aggregate null effects. This paper provides important policy implications regarding college admissions mechanism designs.

More →

Liping Ma, Xin Li, Qiong Zhu, Xiaoyang Ye.

One of the most important mechanism design policies in college admissions is to let students choose a college major sequentially (college-then-major choice) or jointly (college-major choice). In the context of the Chinese meta-major reforms that transition from college-major choice to college-then-major choice, we provide the first experimental evidence on the information frictions and heterogeneous preferences that students have in their response to the meta-major option. In a randomized experiment with a nationwide sample of 11,424 high school graduates, we find that providing information on the benefits of a meta-major significantly increased students’ willingness to choose the meta major; however, information about specific majors and assignment mechanisms did not affect student major choice preferences. We also find that information provision mostly affected the preferences of students who were from disadvantaged backgrounds, lacked accurate information, did not have clear major preferences, or were risk loving.

More →

Dorottya Demszky, Heather C. Hill.

Classroom discourse is a core medium of instruction --- analyzing it can provide a window into teaching and learning as well as driving the development of new tools for improving instruction. We introduce the largest dataset of mathematics classroom transcripts available to researchers, and demonstrate how this data can help improve instruction. The dataset consists of 1,660 45-60 minute long 4th and 5th grade elementary mathematics observations collected by the National Center for Teacher Effectiveness (NCTE) between 2010-2013. The anonymized transcripts represent data from 317 teachers across 4 school districts that serve largely historically marginalized students. The transcripts come with rich metadata, including turn-level annotations for dialogic discourse moves, classroom observation scores, demographic information, survey responses and student test scores. We demonstrate that our natural language processing model, trained on our turn-level annotations, can learn to identify dialogic discourse moves and these moves are correlated with better classroom observation scores and learning outcomes. This dataset opens up several possibilities for researchers, educators and policymakers to learn about and improve K-12 instruction.

The data and its terms of use can be accessed here:

More →

Joshua B. Gilbert.

This simulation study examines the characteristics of the Explanatory Item Response Model (EIRM) when estimating treatment effects when compared to classical test theory (CTT) sum and mean scores and item response theory (IRT)-based theta scores. Results show that the EIRM and IRT theta scores provide generally equivalent bias and false positive rates compared to CTT scores and superior calibration of standard errors under model misspecification. Analysis of the statistical power of each method reveals that the EIRM and IRT theta scores provide a marginal benefit to power and are more robust to missing data than other methods when parametric assumptions are met and provide a substantial benefit to power under heteroskedasticity, but their performance is mixed under other conditions. The methods are illustrated with an empirical data application examining the causal effect of an elementary school literacy intervention on reading comprehension test scores and demonstrates that the EIRM provides a more precise estimate of the average treatment effect than the CTT or IRT theta score approaches. Tradeoffs of model selection and interpretation are discussed.

More →

Seth B. Hunter, Matthew P. Steinberg.

Districts nationwide have revised their educator evaluation systems, increasing the frequency with which administrators observe and evaluate teacher instruction. Yet, limited insight exists on the role of evaluator feedback for instructional improvement. Relying on unique observation-level data, we examine the alignment between evaluator and teacher assessments of teacher instruction and the potential consequences for teacher productivity and mobility. We show that teachers and evaluators typically rate teacher performance similarly during classroom observations, but with significant variability in teacher-evaluator ratings. While teacher performance improves across multiple classroom observations, evaluator ratings likely overstate productivity improvements among the lowest-performing teachers. Evaluators, but not teachers, systematically rate teacher performance lower in classrooms serving higher concentrations of economically disadvantaged students. And while teacher performance improves when evaluators provide more critical feedback about teacher instruction, teachers receiving critical feedback may seek alternative teaching assignments in schools with less critical evaluation settings. We discuss the implications of these findings for the design, implementation and impact of educator evaluation systems.

More →