Search EdWorkingPapers
Methodology, measurement and data
Displaying 81 - 90 of 186
The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts
Classroom discourse is a core medium of instruction --- analyzing it can provide a window into teaching and learning as well as driving the development of new tools for improving instruction. We introduce the largest dataset of mathematics classroom transcripts available to researchers, and demonstrate how this data can help improve instruction. The dataset consists of 1,660 45-60 minute long 4th and 5th grade elementary mathematics observations collected by the National Center for Teacher Effectiveness (NCTE) between 2010-2013. The anonymized transcripts represent data from 317 teachers across 4 school districts that serve largely historically marginalized students. The transcripts come with rich metadata, including turn-level annotations for dialogic discourse moves, classroom observation scores, demographic information, survey responses and student test scores. We demonstrate that our natural language processing model, trained on our turn-level annotations, can learn to identify dialogic discourse moves and these moves are correlated with better classroom observation scores and learning outcomes. This dataset opens up several possibilities for researchers, educators and policymakers to learn about and improve K-12 instruction.
The data and its terms of use can be accessed here: https://github.com/ddemszky/classroom-transcript-analysis
Estimating Treatment Effects with the Explanatory Item Response Model
This simulation study examines the characteristics of the Explanatory Item Response Model (EIRM) when estimating treatment effects when compared to classical test theory (CTT) sum and mean scores and item response theory (IRT)-based theta scores. Results show that the EIRM and IRT theta scores provide generally equivalent bias and false positive rates compared to CTT scores and superior calibration of standard errors under model misspecification. Analysis of the statistical power of each method reveals that the EIRM and IRT theta scores provide a marginal benefit to power and are more robust to missing data than other methods when parametric assumptions are met and provide a substantial benefit to power under heteroskedasticity, but their performance is mixed under other conditions. The methods are illustrated with an empirical data application examining the causal effect of an elementary school literacy intervention on reading comprehension test scores and demonstrates that the EIRM provides a more precise estimate of the average treatment effect than the CTT or IRT theta score approaches. Tradeoffs of model selection and interpretation are discussed.
Do You Observe What I Observe? The Predictors and Consequences of Discordance in Teacher and Evaluator Ratings of Teacher Performance
Districts nationwide have revised their educator evaluation systems, increasing the frequency with which administrators observe and evaluate teacher instruction. Yet, limited insight exists on the role of evaluator feedback for instructional improvement. Relying on unique observation-level data, we examine the alignment between evaluator and teacher assessments of teacher instruction and the potential consequences for teacher productivity and mobility. We show that teachers and evaluators typically rate teacher performance similarly during classroom observations, but with significant variability in teacher-evaluator ratings. While teacher performance improves across multiple classroom observations, evaluator ratings likely overstate productivity improvements among the lowest-performing teachers. Evaluators, but not teachers, systematically rate teacher performance lower in classrooms serving higher concentrations of economically disadvantaged students. And while teacher performance improves when evaluators provide more critical feedback about teacher instruction, teachers receiving critical feedback may seek alternative teaching assignments in schools with less critical evaluation settings. We discuss the implications of these findings for the design, implementation and impact of educator evaluation systems.
Variation in broadband access among undergraduate populations across the United States
Increasing numbers of students require internet access to pursue their undergraduate degrees, yet broadband access remains inequitable across student populations. Furthermore, surveys that currently show differences in access by student demographics or location typically do so at high levels of aggregation, thereby obscuring important variation between subpopulations within larger groups. Through the dual lenses of quantitative intersectionality and critical race spatial analysis, we use Bayesian multilevel regression and census microdata to model variation in broadband access among undergraduate populations at deeper interactions of identity. We find substantive heterogeneity in student broadband access by gender, race, and place, including between typically aggregated subpopulations. Our findings speak to inequities in students’ geographies of opportunity and suggest a range of policy prescriptions at both the institutional and federal level.
Leading Indicators of Long-Term Success in Community Schools: Evidence from New York City
Community schools are an increasingly popular strategy used to improve the performance of students whose learning may be disrupted by non-academic challenges related to poverty. Community schools partner with community based organizations (CBOs) to provide integrated supports such as health and social services, family education, and extended learning opportunities. With over 300 community schools, the New York City Community Schools Initiative (NYC-CS) is the largest of these programs in the country. Using a novel method that combines multiple rating regression discontinuity design (MRRDD) with machine learning (ML) techniques, we estimate the causal effect of NYC-CS on elementary and middle school student attendance and academic achievement. We find an immediate reduction in chronic absenteeism of 5.6 percentage points, which persists over the following three years. We also find large improvements in math and ELA test scores – an increase of 0.26 and 0.16 standard deviations by the third year after implementation – although these effects took longer to manifest than the effects on attendance. Our findings suggest that improved attendance is a leading indicator of success of this model and may be followed by longer-run improvements in academic achievement, which has important implications for how community school programs should be evaluated.
When your bootstraps are not enough: How demand and supply interact to generate learning in settings of extreme poverty
How much does family demand matter for child learning in settings of extreme poverty? In rural Gambia, families with high aspirations for their children’s future education and career, measured before children start school, go on to invest substantially more than other families in the early years of their children’s education. Despite this, essentially no children are literate or numerate three years later. When villages receive a highly-impactful, teacher-focused supply-side intervention, however, children of these families are 25 percent more likely to achieve literacy and numeracy than other children in the same village. Furthermore, improved supply enables these children to acquire other higher-level skills necessary for later learning and child development. We also document patterns of substitutability and complementarity between demand and supply in generating learning at varying levels of skill difficulty. Our analysis shows that greater demand can map onto developmentally meaningful learning differences in such settings, but only with adequate complementary inputs on the supply side.
Putting the K in Rank: How Kindergarten Classrooms Impact Short and Long-Run Outcomes
A student's class rank has important short and long-term effects on important educational outcomes. Despite our growing understanding of these rank effects, we still do not know how early in a child's academic career they begin. To address this, I use data from the Tennessee STAR project, which randomly assigned over 6,323 kindergarteners to classroom environments, to study the impact of kindergarten class rank on a host of short and long-run outcomes. I find a strong, causal relationship between one's kindergarten classroom rank and subsequent test scores, high school achievement and performance on college entrance exams. I also find that having a higher rank in kindergarten causes an increase in study effort, value of school and initiative in the classroom. I also leverage the design of project STAR to test various mechanisms and address several outstanding issues in the rank literature, including the role of tracking, parental effort and teacher-level characteristics in driving the effects of class rank.
Patterns, Determinants, and Consequences of Ability Tracking: Evidence from Texas Public Schools
Schools often track students to classes based on ability. Proponents of tracking argue it is a low-cost tool to improve learning since instruction is more effective when students are more homogeneous, while opponents argue it exacerbates initial differences in opportunities without strong evidence of efficacy. In fact, little is known about the pervasiveness or determinants of ability tracking in the US. To fill this gap, we use detailed administrative data from Texas to estimate the extent of tracking within schools for grades 4 through 8 over the years 2011-2019. We find substantial tracking; tracking within schools overwhelms any sorting by ability that takes place across schools. The most important determinant of tracking is heterogeneity in student ability, and schools operationalize tracking through the classification of students into categories such as gifted and disabled and curricular differentiation. When we examine how tracking changes in response to educational policies, we see that schools decrease tracking in response to accountability pressures. Finally, when we explore how exposure to tracking correlates with student mobility in the achievement distribution, we find positive effects on high-achieving students with no negative effects on low-achieving students, suggesting that tracking may increase inequality by raising the ceiling.
At What Cost?: Is Technical Education Worth the Investment?
Career and technical education (CTE) has existed in the United States for over a century, and only in recent years have there been opportunities to assess the causal impact of participating in these programs while in high school. To date, no work has assessed whether the relative costs of these programs meet or exceed the benefits as described in recent evaluations. In this paper, we use available cost data to compare average costs per pupil in standalone high school CTE programs in Connecticut and Massachusetts to the most likely counterfactual schools. Under a variety of conservative assumptions about the monetary value of known educational and social benefits, we find that programs in Massachusetts offer clear positive returns on investment, whereas programs in Connecticut offer smaller, though mostly non-negative expected returns. We also consider the potential cost effectiveness of CTE programs offered in other contexts to address questions of generalizability.
You Are Who You Eat With: Academic Peer Effects from School Lunch Lines
Using daily lunch transaction data from NYC public schools, I determine which students frequently stand next to one another in the lunch line. I use this `revealed' friendship network to estimate academic peer effects in elementary school classrooms, improving on previous work by defining not only where social connections exist, but the relative strength of these connections. Equally weighting all peers in a reference group assumes that all peers are equally important and may bias estimates by underweighting important peers and overweighting unimportant peers. I find that students who eat together are important influencers of one another's academic performance, with stronger effects in math than in reading. Further exploration of the mechanisms supports my claim that these are friendship networks. I also compare the influence of friends from different periods in the school year and find that connections occurring around standardized testing dates are most influential on test scores.