Speakers at Council of Graduate Schools meeting warn about pitfalls of big data-driven research

You have /5 articles left.
Sign up for a free account or log in.

WASHINGTON -- The rise of big data has been a tremendous boon to researchers, but it has also revealed shortcomings in how higher education collects and analyzes data and judges the impact of research on human subjects.

Speakers during the annual meeting of the Council of Graduate Schools, a membership organization for graduate deans, presented that argument on Friday during a session on the ethical implications of big data-driven research.

The speakers, Anna L. Harvey, Xiao-Li Meng and William F. Tate -- deans of graduate schools of arts and sciences at New York University, Harvard University and Washington University in St. Louis, respectively -- each discussed the pros and kinds of certain types of data, as well as the challenges researchers have encountered in their fields. Harvey is in political science, Meng in statistics, Tate in education.

Their anecdotes eventually coalesced around the same themes: researchers should focus on the quality, not the quantity, of data, and higher education needs to reconsider if its existing procedures for approving and monitoring research -- such as institutional review boards -- have kept up with the times.

“Technology is now allowing social scientists to conduct not only small lab experiments but big experiments, field experiments,” Harvey, professor of politics at NYU, said. “We’re out there in the world experimenting on people. It’s not clear the internal procedures we have for worrying about the ethics of experimentation on human subjects are appropriately servicing issues that might arise in these field experiments -- and [it is] consequently putting our both our grad students and our institutions at risk.”

The call for academe to step up and assert itself in the debate about how data should and should not be used has grown louder over the past few years as data-driven decision making has grown more common in the adviser’s office, the classroom and, more broadly, in everything from online entertainment to shopping. Some efforts to tackle those questions have already been made. In the summers of 2014 and 2016, for example, researchers, ed-tech company executives and leaders of philanthropic foundations gathered to debate the issue at the Asilomar conventions, each time producing a set of guidelines that stressed caution and openness when using data about students for research purposes.

It is uncertain how much traction those guidelines have gained, however. None of the speakers who presented during the session said they were familiar with them. But since the guidelines largely deal with ethical concerns surrounding data generated by students and learners who sign up for free online courses, they don’t necessarily apply to the topics discussed here.

For example, Meng’s talk, titled “Big Data, Big Desire & Big Danger,” used last month’s presidential election as an example of how even minor sampling errors can cause major polling mistakes. Tate, vice provost for graduate education, noted the increase in research that is based on data collected for administrative purposes, and said the U.S. should consider responsibly expanding its collection of that type of data or risk falling farther behind countries in Northern Europe.

Harvey presented the case that has since come to be known as the “Montana mailer experiment” as one example of a project that highlighted flaws in how IRBs evaluate experiments on human subjects.

In the 2014 experiment, researchers at Stanford University and Dartmouth College sent mailers to more than 100,000 registered voters in Montana with information about how candidates running for the state’s Supreme Court compared ideologically to former presidential candidates Barack Obama and Mitt Romney. The goal of the experiment was to see if voters who received the mailers were more likely to vote than those who did not -- in other words, if giving voters information increased turnout.

But the backlash to the experiment was severe. There were accusations of election interference, an accidental violation of Montana election law, $50,000 worth of apology notes -- as well as unusable research funding and worthless results.

Those issues could have been avoided if the ethical implications of the experiment had been discussed in advance, Harvey said. Since the researchers were not collecting private data from human test subjects, however -- the data they looked at are publicly available -- the experiment was exempted from IRB review.

But even if the experiment had been reviewed by an IRB, it is not certain that the board would have flagged the issues that eventually arose. Harvey said she therefore supports rethinking IRB procedures and expanding them to consider if people not directly involved in an experiment may be harmed -- perhaps by creating panels that look specifically at ethical concerns that may arise from field experiments.

“The harm, in this case, is the spillover -- the externalities that weren’t subjects in this experiment,” Harvey said. “The presidents of the institutions, the researchers and others in the field had nothing to fall back on to say, ‘We went through this review process, and these ethical issues were raised and discussed,’ because there wasn’t a venue for that.”