The most recent case of scientific fraud by Dutch social psychologist Diederik Stapel recalls the 2010 case against Harvard University of Marc Hauser, a well-respected researcher in human and animal cognition. In both cases, the focus was on access to and irregularities in handling of data. Stapel retained full control of the raw data, never allowing his students or colleagues to have access to data files. In the case of Hauser, the scientific misconduct investigation found missing data files and unsupported scientific inference at the center of the accusations against him. Outright data fraud by Stapel and sloppy data management and inappropriate data use by Hauser underscore the critical role data transparency plays in preventing scientific misconduct.
Recent developments at the National Science Foundation (and earlier this decade at the National Institutes of Health) suggest a solution — data-sharing requirements for all grant-funded projects and by all scientific journals. Such a requirement could prevent this type of fraud by quickly opening up research data to scrutiny by a wider community of scientists.
Stapel’s case is an extreme example and more likely possible in disciplines with substantially limited imperatives for data sharing and secondary data use. The research traditions of psychology suggest that collecting your own data is the only sound scientific practice. This tradition, less widely shared in other social sciences, encourages researchers to protect data from outsiders. The potential for abuse is clear.
According to published reports about Hauser, there were three instances in which the original data used in published articles could not be found. While Hauser repeated two of those experiments and produced data that supported his papers, his poor handling of data cast a significant shadow of uncertainty and suspicion over his work.
Hauser’s behavior is rare, but not unheard of. In 2008, the latest year for which data are available, the Office of Research Integrity at the U.S. Department of Health and Human Services reported 17 closed institutional cases that included data falsification or fabrication. These cases involved research funded by the federal government, and included the manipulation or misinterpretation of research data rather than the violation of scientific ethics or institutional oversight.
In both Hauser and Stapel's cases, graduate students were the first to alert authorities to irregularities. Rather than relying on other members of a researcher’s lab to come forward (an action that requires a great deal of personal and professional courage,) the new data sharing requirements at NSF and NIH have the potential to introduce long-term cultural changes in the conduct of science that may reduce the likelihood of misconduct based on data fabrication or falsification. The requirements were given teeth at NSF by the inclusion of new data management plans in the scored portion of the grant application.
NIH has since 2003 required all projects requesting more than $500,000 per year to include a data-sharing plan, and the NSF announced in January 2011 that it would require all grant requests to include data management plans. The NSF has an opportunity to reshape scientists' behavior by ensuring that the data-management plans are part of the peer review process and are evaluated for scientific merit. Peer review is essential for data-management plans for two reasons. First and foremost, it creates an incentive for scientists to actually share data. The NIH initiatives have offered the carrot for data sharing — the NSF provides the stick. The second reason is that the plans will reflect the traditions, rules, and constraints of the relevant scientific fields.
Past attempts to force scientists to share data have met with substantial resistance because the legislation did not acknowledge the substantial differences in the structure, use, and nature of data across the social, behavioral and natural sciences, and the costs of preparing data. Data sharing legislation has often been code for, "We don’t like your results," or political cover for previously highly controversial issues such as global warming or the health effects of secondhand smoke. The peer review process, on the other hand, forces consistent standards for data sharing, which are now largely absent, and allow scientists to build and judge those standards. "Witch hunts" disguised as data sharing would disappear.
The intent of the data sharing initiatives at the NIH and currently at NSF has very little to do with controlling or policing scientific misconduct. These initiatives are meant to both advance science more rapidly and to make the funding of science more efficient. Nevertheless, there is a very real side benefit of explicit data sharing requirements: reducing the incidence of true fraud and the likelihood that data errors would be misinterpreted as fraud.
The requirement to make one’s data available in a timely and accessible manner will change incentives and behavior. First, of course, if the data sets are made available in a timely manner to researchers outside the immediate research team, other scientists can begin to scrutinize and replicate findings immediately. A community of scientists is the best police force one can possibly imagine. Secondly, those who contemplate fraud will be faced with the prospect of having to create and share fraudulent data as well as fraudulent findings.
As scientists, it is often easier for us to imagine where we want to go than how to get there. Proponents of data sharing are often viewed as naïve scientific idealists, yet it seems an efficient and elegant solution to the many ongoing struggles to maintain the scientific infrastructure and the public’s trust in federally funded research. Every case of scientific fraud, particularly on such controversial issues such as the biological source of morality (which is part of Hauser’s research) or the sources of racial prejudice (in the case of Stapel) allows those suspicious of science and governments’ commitment to funding science to build a case in the public arena. Advances in technology have allowed the scientific community the opportunity to share data in a broad and scientifically valid manner, and in a way that would effectively counter those critics.
NIH and NSF have led the way toward more open access to scientific data. It is now imperative that other grant funding agencies and scientific journals redouble their own efforts to force data, the raw materials of science, into the light of day well before problems arise.
Felicia B. LeClere is a principal research scientist in the Public Health Department of NORC at the University of Chicago, where she works as research coordinator on multiple projects, including the National Immunization Survey and the National Children's Study.