AI can lessen peer-review woes, researchers say

AI and Peer Review: Enemies or Allies?

Amid bans and restrictions on their use, artificial intelligence tools are creating interest among those who see a solution to systemic peer-review woes.

You have /5 articles left.
Sign up for a free account or log in.

A robotic arm holds a magnifying glass toward an orange background filled with illustrated papers

Some researchers are suggesting using artificial intelligence systems as a tool in peer reviewing scholarly papers.

Photo illustration by Justin Morrison/Inside Higher Ed | Getty Images

Debate over the use of artificial intelligence, already touching everything from admissions to grading, has reached peer reviewing, as academics balance technological uncertainty and ethical concerns with potential solutions for persistent peer-review problems.

“We’re seeing the human peer-review system is really stressed,” said James Zou, an assistant professor of biomedical science data at Stanford University. “The number of papers have increased by several-fold over the last few years, and it’s challenging to find a lot of high-quality reviewers who have the time and expertise to review the paper.”

Zou, along with Laurie Schintler, an associate professor at the Schar School of Policy and Government at George Mason University, both tackled the potential use of AI in peer review in recent papers. The verdict on the usefulness of the technology: the overlap between human and AI feedback is “comparable,” according to Zou, especially when it comes to less “mature” papers—either those in the early stages or ones that were never published.

AI Studies

Zou started looking into the implications using AI for peer review after seeing it get “harder and harder to get high-quality feedback” in a timely manner, he said.

This month, after half a year of work, Zou and his co-researchers published “Can large language models provide useful feedback on research papers? A large-scale empirical analysis.”

They took two sets of research papers that had been reviewed by humans and inputted them into OpenAI’s ChatGPT, comparing the technology’s comments to the live reviewers.

“I think we were a bit surprised and interested to see the overlap with human feedback is quite high,” he said.

Just weeks earlier in September, George Mason’s Schintler co-published “A Critical Examination of the Ethics of AI-Mediated Peer Review.”

Schintler said she’s considered the potential implications of AI for close to a decade. She, like Zou, pointed to the long waits for peer reviews, saying, anecdotally, she has heard some people take up to two years to get reviews.

Schintler found that the same guidelines that are used by the scientific community at large need to be used with AI. She listed off “guiding principles” such as human autonomy, prevention of harm and the promotion of fairness, explicability and accountability.

“Making sure these applications align with the norms and values of the scientific community—those are embedded in the ethos of science,” she said. “We need to think of it systematically and how the AI system in peer review and scholarly communication align and don’t align with those norms.”

Woes and Worries

Peer review, already straining with limited time and resources, faces even greater struggle after the COVID-19 pandemic.

“Every single person said, ‘No, we would love to help out in this sort of thing, but we just don’t have the time,’” Tanya Joosten, co-director of the National Research Center for Distance Education and Technological Advancements, said last year. “It’s delaying the dissemination of scientific knowledge across the board. It’s like our knowledge creation as a society is slowing down and speeding up at the same time.”

The initial response to using AI in peer review has been guarded. Several journals and academic groups have already explicitly stated that the use of AI should be limited or banned in submissions.

The National Institutes of Health banned the use of online generative AI tools like ChatGPT “for analyzing and formulating peer-review critiques.” The JAMA Network, which includes titles published by the American Medical Association, requires authors to disclose their use of AI and banned listing artificial intelligence generators as authors.

The family of journals produced by Science does not allow text, figures, images or data generated by AI to be used without editors’ permission. And Nature banned images and videos generated by AI; it also requires the use of language models to be disclosed.

The October statement by multiple editors of bioethics and humanities journals urged transparency and not overly relying on the technology and, following other journals’ leads, it suggested not listing AI as an author.

“We do not pretend to have resolved the many social questions that we think generative AI raises for scholarly publishing, but in the interest of fostering a wider conversation about these questions, we have developed a preliminary set of recommendations about generative AI in scholarly publishing,” the statement read.

Concerns to Consider

There’s an overall caution that comes with suggesting the use of AI. Issues of bias and hallucinations—the term for AI making up false claims and citing them as fact—have long been cited as reasons to avoid the technology altogether.

Zou found in his research that ChatGPT struggled with providing in-depth feedback on the methodology of the reports it assessed. With this and other concerns in mind, he repeatedly stated that AI was to be used as a tool, not a replacement, for humans.

“Researchers should not solely rely on GPT; they should reach out to others to get feedback, and they should be aware there are limitations to the feedback,” he said.

Zou and Schintler both acknowledged one of the oft-cited complaints of AI is its inherent bias. However, they noted that humans have their own biases, and regardless of whether the bias is from a computer or person, it needs to be addressed.

“We have serious issues with peer review related to bias and discrimination; AI could add insult to injury if we don’t attend to those implications,” Schintler said. “We can exploit AI in a positive way. We have to think about how to do that strategically, without having the negative implication like algorithm bias and discrimination. It’s basically a way to support our various activities, not to replace it.”

Daniel Schiff, co-director of Purdue University’s Governance and AI Lab, also cited concerns with bias, specifically against women and minorities. He said AI could be used for simple peer-review fixes, like alerting a researcher if a picture was not embedded or if a researcher forgot to include the paper’s word count. But when it comes to more nuanced thoughts, Schiff said he believes human reviewers are still the best for the task.

“The most advanced, subtle connections—those are the things I would like humans to keep doing,” he said. “Picking up the nuance of, ‘This is old,’ ‘This isn’t the cutting edge’ … that is one of the last places I’d want to use AI right now.”

He added the use of AI could be seen as a simple Band-Aid to a wider issue of the peer-review world.

“Sometimes we can use AI as a shortcut to avoid making deeper systematic changes when you need to do a deeper examination,” he said, adding that, on the day of his interview with Inside Higher Ed, he received five requests for peer reviews.

Schintler agreed, stating it will take a systemwide approach to tackle the issue.

“I’m a proponent of using [AI] for peer review as long as we’re transparent about its use,” she said. “And that we take steps to make sure downsides are mitigated or completely eliminated. It’ll take some work, but we need to think systematically about it—which will take bringing the whole ecosystem together: publishers, researchers, funding agencies.”