AI writes summaries of preprints in bioRxiv trial


An artificial robotic arm write down some notes with pen.

AI approaches are increasingly being applied to help researchers digest the scientific literature.Credit: Михаил Руденко/Getty

Earlier this month, Erik van Nimwegen and Pascal Grobecker, computational biologists at the University of Basel in Switzerland, posted a preprint1 on the bioRxiv server describing a new tool for deciphering gene-expression patterns in individual cells. Van Nimwegen was excited by the work and crafted a long summary on the social media site X, formerly Twitter.

So he was surprised — and troubled — to read a competing precis produced by an artificial intelligence (AI) tool similar to ChatGPT. The summary stood alongside the preprint on bioRxiv. The first sentence was gibberish, van Nimwegen says, and it only got worse from there. “I’d rather have no summaries than this hash,” he fumed on X.

The summary is part of a bioRxiv pilot, announced on 8 November, that uses text-generating neural networks called large language models (LLMs) to give an outline of all new preprints on the site. The service creates three short summaries aimed at different reading levels, from general to expert.

“The motivation was really to improve accessibility. Scientific papers can be incredibly arcane,” says Richard Sever, bioRxiv’s co-founder and assistant director of Cold Spring Harbor Laboratory Press in New York. “There will be some nonsense, inevitably.”

The bioRxiv pilot is part of a broader trend of using LLMs to help researchers — and the public — keep afloat in the tsunami of scientific literature. The physics-focused preprint server arXiv uses AI to generate audio summaries of some papers, and publishers and funders are starting to roll out features that allow users to ‘talk to a paper’ through a chatbot.

Decision-making tool

BioRxiv’s summary pilot is being run with ScienceCast, a Towson, Maryland-based start-up with close links to arXiv. The company developed the LLM that produces the arXiv audio summaries, aservice already available on the company’s website for some bioRxiv preprints.

The written summaries on bioRxiv’s website are based on the entire paper — not just the abstract — and Sever expects that many scientists will use them to decide whether to read the full paper. Currently, a notice beside the summaries indicates that they were written by an AI tool and not approved by the authors.

Before rolling out the service, Sever and his colleagues evaluated several dozen summaries produced by the tool. Most were pretty good, he says, and some were even better than the abstracts scientists had written. Others included clear falsehoods. “We know there are going to be errors in these things,” Sever says.

Robert Seder, a vaccine scientist at the US National Institute of Allergy and Infectious Diseases in Bethesda, Maryland, found several errors in the summaries of his team’s preprint2 testing an inhaled COVID-19 vaccine in monkeys. Of the three AI-generated precis, the one aimed at a general audience was the best, and the mid-level synopsis the least accurate, Seder says. With a few key edits, the summaries would accurately reflect the work, he adds.

Van Nimwegen says the general summary of his paper was okay, and his real gripe was with the highest-level synopsis.

If the pilot becomes a fully fledged service, bioRxiv might look at routinely involving authors in proofreading and approving the content, Sever says. For now, to minimize the consequences of errors, the pilot is not being rolled out to medRxiv, a sister medical-research preprint server run by Cold Spring Harbor Laboratory Press, the London-based publisher BMJ and Yale University in New Haven, Connecticut. MedRxiv studies typically have clinical relevance, and errors could guide patient behaviour. By limiting the pilot to bioRxiv, says Sever, “the consequences of being wrong are more that somebody might feel misled or misunderstand a fairly arcane study in cell biology”.

The next AI tool

Aaron Tay, a librarian at Singapore Management University, has experimented with using LLMs to generate summaries of papers published by researchers at his university. “Most of them are impressed. It’s not perfect but it gets the main factual points, usually,” he says.

As LLMs become more advanced — and models are developed to focus on specific fields of science — their ability to summarize research is likely to improve, Tay adds.

Victor Galitski, ScienceCast’s chief technical officer and a quantum physicist at the University of Maryland in College Park, says the company is working on developing field-specific models.

Moreover, Galitski thinks that automated summaries and other LLM-generated insights will be powerful tools for keeping up with the flood of scientific literature: he has calculated that it would take 150 years to read all the COVID-19 papers published in the past 4 years. “There may be gems that are missing,” he says.

BioRxiv is already working on its next AI-powered feature. ScienceCast’s website hosts a feature that allows users to have a ‘conservation’ with a subset of arXiv preprints to ask, for example, about a paper’s key findings. The company is fine-tuning this feature for biological data and will soon release a comparable version for bioRxiv, says Galitski. Last week, the US National Cancer Institute in Bethesda released a similar app called NanCI, which lets users ask questions such as “What is the key hypothesis tested?” of a subset of papers, and get referenced answers.

Tay expects more publishers to harness the power of LLMs. “I am cautiously optimistic these features will be helpful,” he says. “We really need more formal studies that study the accuracy of such summarizations over scientific literature.”


Leave a Reply

Your email address will not be published. Required fields are marked *