Search EdWorkingPapers

Annaliese Paulson

Classifying Courses at Scale: a Text as Data Approach to Characterizing Student Course-Taking Trends with Administrative Transcripts

Students’ postsecondary course-taking is of interest to researchers, yet has been difficult to study at large scale because administrative transcript data are rarely standardized across institutions or state systems. This paper uses machine learning and natural language processing to standardize college transcripts at scale. We demonstrate the approach’s utility by showing how the disciplinary orientation of students’ courses and majors align and diverge at 18 diverse four-year institutions in the College and Beyond II dataset. Our findings complicate narratives that student participation in the liberal arts is in great decline. Both professional and liberal arts majors enroll in a large amount of liberal arts coursework, and in three of the four core liberal arts disciplines, the share of course-taking in those fields is meaningfully higher than the share of majors in those fields. To advance the study of student postsecondary pathways, we release the classification models for public use.

More →