Replication is essential for building confidence in research studies, yet it is still the exception rather than the rule. That is not necessarily because funding is unavailable — it is because the current system makes original authors and replicators antagonists. Focusing on the fields of economics, political science, sociology, and psychology, in which ready access to raw data and software code are crucial to replication efforts, we survey deficiencies in the current system.
To see how often the posted data and code could readily replicate original results, we attempted to recreate the tables and figures of a number of papers using the code and data provided by authors. Of 415 articles published in 9 leading economics journals in May 2016, 203 were empirical papers that did not contain proprietary or otherwise restricted data. We were able to replicate only a small minority of these papers. Overall, of the 203 studies, 76% published at least one of the 4 files required for replication: the raw data used in the study (32%); the final estimation data set produced after data cleaning and variable manipulation (60%); the data-manipulation code used to convert the raw data to the estimation data (42%, but only 16% had both raw data and usable code that ran); and the estimation code used to produce the final tables and figures (72%). The estimation code was the file most frequently provided. But it ran in only 40% of these cases. We were able to produce final tables and figures from estimation data in only 37% of the studies analyzed. And in only 14% of 203 studies could we do the same starting from the raw data.
We propose reforms that can both encourage and reinforce better behavior — a system in which authors feel that replication of software code is both probable and fair, and in which less time and effort is required for replication.
Publications associated with this project:
Copyright 2024. All Rights Reserved
Design & Dev by Wonderland Collective