Is Big Data Better? LMS Data and Predictive Analytic Performance in Postsecondary Education

May 2024

Colleges have increasingly turned to data science applications to improve student outcomes. One prominent application is to predict students’ risk of failing a course. In this paper, we investigate whether incorporating data from learning management systems (LMS)--which captures detailed information on students’ engagement in course activities--increases the accuracy of predicting student success beyond using just administrative data alone. We use data from the Virginia Community College System to build random forest models based on student type (new versus returning) and data source (administrative-only, LMS-only, or full data). We find that among returning college students, models that use administrative-only outperform models that use LMSonly. Combining the two types of data results in minimal increased accuracy. Among new students, LMS-only models outperform administrative-only models, and accuracy is significantly higher when both types of predictors are used. This pattern of results reflects the fact that community college administrative data contains little information about new students. Within the LMS data, we find that LMS data pertaining to students’ engagement during the first part of the course has the most predictive value.

Keywords

Data science, college success, predictive analytics, community college, machine learning, LMS data, clickstream data

Education level

Post-secondary education

Topics

Pathways to and Through Postsecondary