Search EdWorkingPapers by author, title, or keywords.
Methodology, measurement and data
Standardized assessments are widely used to determine access to educational resources with important consequences for later economic outcomes in life. However, many design features of the tests themselves may lead to psychological reactions influencing performance. In particular, the level of difficulty of the earlier questions in a test may affect performance in later questions. How should we order test questions according to their level of difficulty such that test performance offers an accurate assessment of the test taker's aptitudes and knowledge? We conduct a field experiment with about 19,000 participants in collaboration with an online teaching platform where we randomly assign participants to different orders of difficulty and we find that ordering the questions from easiest to most difficult yields the lowest probability to abandon the test, as well as the highest number of correct answers. Consistent results are found exploiting the random variation of difficulty across test booklets in the Programme for International Student Assessment (PISA), a triannual international test, for the years of 2009, 2012, and 2015, providing additional external validity. We conclude that the order of the difficulty of the questions in tests should be considered carefully, in particular when comparing performance between test-takers who have faced different order of questions.
After increasing in the 1970s and 1980s, time to bachelor’s degree has declined since the 1990s. We document this fact using data from three nationally representative surveys. We show that this pattern is occurring across school types and for all student types. Using administrative student records from 11 large universities, we confirm the finding and show that it is robust to alternative sample definitions. We discuss what might explain the decline in time to bachelor’s degree by considering trends in student preparation, state funding, student enrollment, study time, and student employment during college.
Local governments spend over 12 billion dollars annually funding the operation of 15,000 public libraries in the United States. This funding supports widespread library use: more than 50% of Americans visit public libraries each year. But despite extensive public investment in libraries, surprisingly little research quantifies the effects of public libraries on communities and children. We use data on the near-universe of U.S. public libraries to study the effects of capital spending shocks on library resources, patron usage, student achievement, and local housing prices. We use a dynamic difference-in-difference approach to show that library capital investment increases children’s attendance at library events by 18%, children’s checkouts of items by 21%, and total library visits by 21%. Increases in library use translate into improved children’s test scores in nearby school districts: a $1,000 or greater per-student capital investment in local public libraries increases reading test scores by 0.02 standard deviations and has no effects on math test scores. Housing prices do not change after a sharp increase in public library capital investment, suggesting that residents internalize the increased cost and improved quality of their public libraries.
Using individual data from PIAAC and aggregate data on GDP and unemployment for the US, Europe, and Spain, we test how macroeconomic conditions experienced at age eighteen affect the following decisions in post-secondary and tertiary education: i) enrollment ii) dropping-out, iii) type of degree completed, iv) area of specialization, and v) time-to-degree. We also analyze how the effects differ by gender and parental background. Our findings are different for each of these geographies, which shows that the impacts of macroeconomic conditions on higher education decisions depend on context, such as labor markets and education systems. By analyzing various components of higher education together, we are able to obtain a clearer picture of how potential mechanisms linked to lower opportunity costs of education and reduced ability to pay during economic downturns interact to determine student selection.
Occupational credentials provide an additional—and, at times, alternative—path other than traditional academic degrees for individuals to increase productivity and demonstrate their abilities and qualifications to employers. These credentials take the form of licenses and certifications. Although a critical part of the workforce landscape, the literature on the returns to credentials is inadequate, with prior research having limited causal identification, typically relying on OLS regressions which do not sufficiently control for selection. Using questions that identify credential receipt from the 2015 and 2016 Current Population Surveys, we construct an instrumental variable of local peer influence using the within-labor market credential rate of individuals sharing the same sociodemographic characteristics, while controlling for the same group’s average wages and a suite of demographic and geographic controls. We use this instrument in a marginal treatment effects estimator, which allows for estimation of the average treatment effect and determines the direction of selection, and we estimate the effects of credentials on labor market outcomes. We find large, meaningful returns in the form of increased employment, an effect which is concentrated primarily among women. The effect of having a credential on log wages is higher for those in the sub-baccalaureate labor market, suggesting the potential role of occupational credentials as an alternative path to marketable human capital and a signal of skills in the absence of a bachelor’s degree.
Employers may favor applicants who played college sports if athletics participation contributes to leadership, conscientiousness, discipline, and other traits that are desirable for labor-market productivity. We conduct a resume audit to estimate the causal effect of listing collegiate athletics on employer callbacks and test for subgroup effects by ethnicity, gender, and sport type. We applied to more than 450 jobs on a large, well-known job board. For each job listing we submitted two fictitious resumes, one of which was randomly assigned to include collegiate varsity athletics. Overall, listing a college sport does not produce a statistically significant change in the likelihood of receiving a callback or interview request. However, among non-white applicants, athletes are 3.2 percentage points less likely to receive an interview request (p = .04) relative to non-athletes. We find no statistically significant differences among males or females.
Recent interest to promote and support replication efforts assume that there is well-established methodological guidance for designing and implementing these studies. However, no such consensus exists in the methodology literature. This article addresses these challenges by describing design-based approaches for planning systematic replication studies. Our general approach is derived from the Causal Replication Framework (CRF), which formalizes the assumptions under which replication success can be expected. The assumptions may be understood broadly as replication design requirements and individual study design requirements. Replication failure occurs when one or more CRF assumptions are violated. In design-based approaches to replication, CRF assumptions are systematically tested to evaluate the replicability of effects, as well as to identify sources of effect variation when replication failure is observed. In direct replication designs, replication failure is evidence of bias or incorrect reporting in individual study estimates, while in conceptual replication designs, replication failure occurs because of effect variation due to differences in treatments, outcomes, settings, and participant characteristics. The paper demonstrates how multiple research designs may be combined in systematic replication studies, as well as how diagnostic measures may be used to assess the extent to which CRF assumptions are met in field settings.
Valid and reliable measurements of teaching quality facilitate school-level decision-making and policies pertaining to teachers. Using nearly 1,000 word-to-word transcriptions of 4th- and 5th-grade English language arts classes, we apply novel text-as-data methods to develop automated measures of teaching to complement classroom observations traditionally done by human raters. This approach is free of rater bias and enables the detection of three instructional factors that are well aligned with commonly used observation protocols: classroom management, interactive instruction, and teacher-centered instruction. The teacher-centered instruction factor is a consistent negative predictor of value-added scores, even after controlling for teachers’ average classroom observation scores. The interactive instruction factor predicts positive value-added scores. Our results suggest that the text-as-data approach has the potential to enhance existing classroom observation systems through collecting far more data on teaching with a lower cost, higher speed, and the detection of multifaceted classroom practices.
There is no national consensus on how school districts calculate high school achievement disparities between students who experience homelessness and those who do not. Using administrative student-level data from a mid-sized public school district in the Southern United States, we show that commonly used ways of defining which students are considered homeless can yield markedly different estimates of the homelessness-housed student high school graduation gap. The key distinctions among homelessness definitions relate to how to classify homeless students who become housed and how to consider students who transfer out of the district or drop out of school. Eliminating housing insecurity-related achievement disparities necessitates understanding the link between homelessness and educational achievement; how districts quantify homelessness affects measured gaps.
We show that fade out biases value-added estimates at the teacher-level. To do so, we use administrative data from North Carolina and show that teachers' value-added depend on the quality of the teacher that preceded them. Value-added estimators that control for fade out feature no such teacher-level bias. Under a benchmark policy that releases teachers in the bottom five percent of the value-added distribution, fifteen percent of teachers released using traditional techniques are not released once fade out is accounted for. Our results highlight the importance of incorporating dynamic features of education production into the estimation of teacher quality.