The most recent case of scientific fraud by Dutch social psychologist Diederik Stapel recalls the 2010 case against Harvard University of Marc Hauser, a well-respected researcher in human and animal cognition. In both cases, the focus was on access to and irregularities in handling of data. Stapel retained full control of the raw data, never allowing his students or colleagues to have access to data files. In the case of Hauser, the scientific misconduct investigation found missing data files and unsupported scientific inference at the center of the accusations against him. Outright data fraud by Stapel and sloppy data management and inappropriate data use by Hauser underscore the critical role data transparency plays in preventing scientific misconduct.
Recent developments at the National Science Foundation (and earlier this decade at the National Institutes of Health) suggest a solution — data-sharing requirements for all grant-funded projects and by all scientific journals. Such a requirement could prevent this type of fraud by quickly opening up research data to scrutiny by a wider community of scientists.
Stapel’s case is an extreme example and more likely possible in disciplines with substantially limited imperatives for data sharing and secondary data use. The research traditions of psychology suggest that collecting your own data is the only sound scientific practice. This tradition, less widely shared in other social sciences, encourages researchers to protect data from outsiders. The potential for abuse is clear.
According to published reports about Hauser, there were three instances in which the original data used in published articles could not be found. While Hauser repeated two of those experiments and produced data that supported his papers, his poor handling of data cast a significant shadow of uncertainty and suspicion over his work.
Hauser’s behavior is rare, but not unheard of. In 2008, the latest year for which data are available, the Office of Research Integrity at the U.S. Department of Health and Human Services reported 17 closed institutional cases that included data falsification or fabrication. These cases involved research funded by the federal government, and included the manipulation or misinterpretation of research data rather than the violation of scientific ethics or institutional oversight.
In both Hauser and Stapel's cases, graduate students were the first to alert authorities to irregularities. Rather than relying on other members of a researcher’s lab to come forward (an action that requires a great deal of personal and professional courage,) the new data sharing requirements at NSF and NIH have the potential to introduce long-term cultural changes in the conduct of science that may reduce the likelihood of misconduct based on data fabrication or falsification. The requirements were given teeth at NSF by the inclusion of new data management plans in the scored portion of the grant application.
NIH has since 2003 required all projects requesting more than $500,000 per year to include a data-sharing plan, and the NSF announced in January 2011 that it would require all grant requests to include data management plans. The NSF has an opportunity to reshape scientists' behavior by ensuring that the data-management plans are part of the peer review process and are evaluated for scientific merit. Peer review is essential for data-management plans for two reasons. First and foremost, it creates an incentive for scientists to actually share data. The NIH initiatives have offered the carrot for data sharing — the NSF provides the stick. The second reason is that the plans will reflect the traditions, rules, and constraints of the relevant scientific fields.
Past attempts to force scientists to share data have met with substantial resistance because the legislation did not acknowledge the substantial differences in the structure, use, and nature of data across the social, behavioral and natural sciences, and the costs of preparing data. Data sharing legislation has often been code for, "We don’t like your results," or political cover for previously highly controversial issues such as global warming or the health effects of secondhand smoke. The peer review process, on the other hand, forces consistent standards for data sharing, which are now largely absent, and allow scientists to build and judge those standards. "Witch hunts" disguised as data sharing would disappear.
The intent of the data sharing initiatives at the NIH and currently at NSF has very little to do with controlling or policing scientific misconduct. These initiatives are meant to both advance science more rapidly and to make the funding of science more efficient. Nevertheless, there is a very real side benefit of explicit data sharing requirements: reducing the incidence of true fraud and the likelihood that data errors would be misinterpreted as fraud.
The requirement to make one’s data available in a timely and accessible manner will change incentives and behavior. First, of course, if the data sets are made available in a timely manner to researchers outside the immediate research team, other scientists can begin to scrutinize and replicate findings immediately. A community of scientists is the best police force one can possibly imagine. Secondly, those who contemplate fraud will be faced with the prospect of having to create and share fraudulent data as well as fraudulent findings.
As scientists, it is often easier for us to imagine where we want to go than how to get there. Proponents of data sharing are often viewed as naïve scientific idealists, yet it seems an efficient and elegant solution to the many ongoing struggles to maintain the scientific infrastructure and the public’s trust in federally funded research. Every case of scientific fraud, particularly on such controversial issues such as the biological source of morality (which is part of Hauser’s research) or the sources of racial prejudice (in the case of Stapel) allows those suspicious of science and governments’ commitment to funding science to build a case in the public arena. Advances in technology have allowed the scientific community the opportunity to share data in a broad and scientifically valid manner, and in a way that would effectively counter those critics.
NIH and NSF have led the way toward more open access to scientific data. It is now imperative that other grant funding agencies and scientific journals redouble their own efforts to force data, the raw materials of science, into the light of day well before problems arise.
Felicia B. LeClere is a principal research scientist in the Public Health Department of NORC at the University of Chicago, where she works as research coordinator on multiple projects, including the National Immunization Survey and the National Children's Study.
An organic chemist I know tells her doctors that she is a professor of Southern literature whenever she is in the hospital. That’s because organic chemistry has come to symbolize all the irrelevant science hoops that premedical and medical students jump through on the way to becoming physicians. Today, we are told, medical students should be learning “people skills,” placing medicine in the context of the community and learning how individuals make choices related to their health. These preferences are reflected in the revised medical admissions test rolled out earlier this year, with its newly added questions related to sociology, psychology and the humanities. This summer, as interviews begin at medical schools around the country, candidates who want to make the final cut are sometimes playing down their science credentials in favor of their relational skills.
This seems to me to be a false dichotomy. To be sure, I want my physician to understand how to deal with me as an individual and as a member of my social group. But I also want her to appreciate the underlying molecular nature of disease and to know how to evaluate scientific and statistical evidence about clinical trials and treatments.
The movement away from science springs from a misunderstanding that is not limited to the premed curriculum. Many people have the experience of science taught as a series of isolated facts to be memorized. All physicians recall memorizing biochemical pathways for which they have no use past the final exam in a given course. If there were ever a time when memorization had a place, that time is gone. Facts are cheap and readily available on every smartphone and computer.
The truth is that science is about so much more than memorizing a set of facts. Practitioners with a solid scientific grounding are able to analyze data and put that data in context, rely on what is known from previous studies and extrapolate to the future, and understand how changing environmental conditions are reflected in bodily conditions.
I have taught biochemistry to medical and undergraduate students for over 30 years. Premedical students usually come into my classes expecting to memorize structures, nomenclature, and pathways and are a bit taken aback at the idea that there is anything to learn other than that. By examining experimental data and case studies they become familiar with the core of biochemistry and are able to go far beyond rote learning. Unfortunately I hear back from them once they are in professional schools that, “it was great that you taught us about concepts, but you should have had us memorize more since that is what we have to do here.” As long as the health professions emphasize the acquisition of facts rather than their application, science will be seen as dry, uncreative and mostly irrelevant to the “real” world.
Along with colleagues at Wellesley -- Lee Cuba and Alexandra Day -- I recently published a study of science majors at liberal arts colleges. Our major finding was that science majors who took many courses outside of the sciences were better able to make connections among disciplines. Some medical schools -- Mount Sinai in New York is a prominent example -- have begun recruiting humanities majors to their classes, requiring fewer science courses than for the typical applicant because they are thought to bring different strengths to the profession. This move is well intended, but it misses the point.
Privileging humanities majors in medical school admissions may inadvertently reinforce the opposition between the “soft skills” associated with humanists and the technical capabilities associated with scientists. Long before the health sciences became deeply specialized, renowned physicians such as Hippocrates, Maimonides, John Locke and John Keats were as much philosophers and poets as scientists. Although that kind of Renaissance career may no longer be practical, today a strong liberal arts education in both the arts and sciences provides the most effective preparation for the medical profession.
Medical schools would do better to recruit broadly educated science students who bring the complementary strengths of integration among disciplines and a deep grounding in the process of scientific discovery and analysis to their study and practice of medicine. If we want knowledgeable and competent doctors who are also well-rounded and compassionate individuals, we must stop treating the arts and sciences as mutually exclusive. We must help our students see the connections between what they are learning in the classroom and what they will practice in the “real world,” to see that organic chemistry and Southern literature are not irreparably separate, but that each may have a role in a medical education.
Adele Wolfson is Nan Walsh Schow and Howard B. Schow Professor of Physical and Natural Sciences and interim dean of students at Wellesley College.