Search EdWorkingPapers

Search EdWorkingPapers by author, title, or keywords.

Julie Cohen

Arielle Boguslav, Julie Cohen.

Teacher preparation programs are increasingly expected to use data on pre-service teacher (PST) skills to drive program improvement and provide targeted supports. Observational ratings are especially vital, but also prone to measurement issues. Scores may be influenced by factors unrelated to PSTs’ instructional skills, including rater standards and mentor teachers’ skills. Yet we know little about how these measurement challenges play out in the PST context. Here we investigate the reliability and sensitivity of two observational measures. We find measures collected during student teaching are especially prone to measurement issues; only 3-4% of variation in scores reflects consistent differences between PSTs, while 9-17% of variation can be attributed to the mentors with whom they work. When high scores stem not from strong instructional skills, but instead from external circumstances, we cannot use them to make consequential decisions about PSTs’ individual needs or readiness for independent teaching.

More →


Julie Cohen, Anandita Krishnamachari, Vivian C. Wong.

Many novice teachers learn to teach “on-the-job,” leading to burnout and attrition among teachers and negative outcomes for students in the long term. Pre-service teacher education is tasked with optimizing teacher readiness, but there is a lack of causal evidence regarding effective ways for preparing new teachers. In this paper, we use a mixed reality simulation platform to evaluate the causal effects and robustness of an individualized, directive coaching model for candidates enrolled in a university-based teacher education program, as well as for undergraduates considering teaching as a profession. Across five conceptual replication studies, we find that targeted, directive coaching significantly improves candidates’ instructional performance during simulated classroom sessions, and that coaching effects are robust across different teaching tasks, study timing, and modes of delivery. However, coaching effects are smaller for a sub-population of participants not formally enrolled in a teacher preparation program. These participants differed from teacher candidates in multiple ways, including by demographic characteristics, as well as by their prior experiences learning about instructional methods. We highlight implications for research and practice.

More →


Dorottya Demszky, Jing Liu, Zid Mancenido, Julie Cohen, Heather C. Hill, Dan Jurafsky, Tatsunori Hashimoto.

In conversation, uptake happens when a speaker builds on the contribution of their interlocutor by, for example, acknowledging, repeating or reformulating what they have said. In education, teachers' uptake of student contributions has been linked to higher student achievement. Yet measuring and improving teachers' uptake at scale is challenging, as existing methods require expensive annotation by experts. We propose a framework for computationally measuring uptake, by (1) releasing a dataset of student-teacher exchanges extracted from US math classroom transcripts annotated for uptake by experts; (2) formalizing uptake as pointwise Jensen-Shannon Divergence (pJSD), estimated via next utterance classification; (3) conducting a linguistically-motivated comparison of different unsupervised measures and (4) correlating these measures with educational outcomes. We find that although repetition captures a significant part of uptake, pJSD outperforms repetition-based baselines, as it is capable of identifying a wider range of uptake phenomena like question answering and reformulation. We apply our uptake measure to three different educational datasets with outcome indicators. Unlike baseline measures, pJSD correlates significantly with instruction quality in all three, providing evidence for its generalizability and for its potential to serve as an automated professional development tool for teachers.

More →


Jing Liu, Julie Cohen.

Valid and reliable measurements of teaching quality facilitate school-level decision-making and policies pertaining to teachers. Using nearly 1,000 word-to-word transcriptions of 4th- and 5th-grade English language arts classes, we apply novel text-as-data methods to develop automated measures of teaching to complement classroom observations traditionally done by human raters. This approach is free of rater bias and enables the detection of three instructional factors that are well aligned with commonly used observation protocols: classroom management, interactive instruction, and teacher-centered instruction. The teacher-centered instruction factor is a consistent negative predictor of value-added scores, even after controlling for teachers’ average classroom observation scores. The interactive instruction factor predicts positive value-added scores. Our results suggest that the text-as-data approach has the potential to enhance existing classroom observation systems through collecting far more data on teaching with a lower cost, higher speed, and the detection of multifaceted classroom practices.

More →


Jing Liu, Julie Cohen.

Valid and reliable measurements of teaching quality facilitate school-level decision-making and policies pertaining to teachers, but conventional classroom observations are costly, prone to rater bias, and hard to implement at scale. Using nearly 1,000 word-to-word transcriptions of 4th- and 5th-grade English language arts classes, we apply novel text-as-data methods to develop automated, objective measures of teaching to complement classroom observations. This approach is free of rater bias and enables the detection of three instructional factors that are well aligned with commonly used observation protocols: classroom management, interactive instruction, and teacher-centered instruction. The teacher-centered instruction factor is a consistent negative predictor of value-added scores, even after controlling for teachers’ average classroom observation scores. The interactive instruction factor predicts positive value-added scores.

More →


Julie Cohen, Susanna Loeb, Luke C. Miller, James H. Wyckoff.

Ten years ago, the reform of teacher evaluation was touted as a mechanism to improve teacher effectiveness. In response, virtually every state redesigned its teacher evaluation system. Recently, a growing narrative suggests these reforms failed and should be abandoned. This response may be overly simplistic. We explore the variability of New York City principals’ implementation of policies intended to promote teaching effectiveness. Drawing on survey, interview, and administrative data, we analyze whether principals believe they can use teacher evaluation and tenure policies to improve teaching effectiveness, and how such perceptions influence policy implementation. We find that principals with greater perceived agency are more likely to strategically employ tenure and evaluation policies. Results have important implications for principal training and policy implementation.

More →