Reproducibility trial: 246 biologists get different results from same data sets

A young woman is surrounded by monitors & their reflections displaying scrolling text & data. — Scientists who ran separate analyses on a single data set about the effect of grass cover on *Eucalyptus* seedlings arrived at vastly different answers.Credit: Laurence Dutton/Getty

In a massive exercise to examine reproducibility, more than 200 biologists analysed the same sets of ecological data — and got widely divergent results. The first sweeping study¹ of its kind in ecology demonstrates how much results in the field can vary, not because of differences in the environment, but because of scientists’ analytical choices.

“There can be a tendency to treat individual papers’ findings as definitive,” says Hannah Fraser, an ecology meta researcher at the University of Melbourne in Australia and a co-author of the study. But the results show that “we really can’t be relying on any individual result or any individual study to tell us the whole story”.

Replication games: how to make reproducibility research more systematic

Variation in results might not be surprising, but quantifying that variation in a formal study could catalyse a larger movement to improve reproducibility, says Brian Nosek, executive director of the Center for Open Science in Charlottesville, Virginia, who has driven discussions about reproducibility in the social sciences.

“This paper may help to consolidate what is a relatively small, reform-minded community in ecology and evolutionary biology into a much bigger movement, in the same way as the reproducibility project that we did in psychology,” he says. It would be hard “for many in this field to not recognize the profound implications of this result for their work”.

The study was published as a preprint on 4 October. The results have not yet been peer reviewed.

Replication studies’ roots

The ‘many analysts’ method was pioneered by psychologists and social scientists in the mid-2010s, as they grew increasingly aware of results in the field that could not be replicated. Such work gives multiple researchers the same data and the same research questions. The authors can then compare how decisions made after data collection affect the types of result that eventually make it into publications.

The study by Fraser and her colleagues brings the many-analyst method to ecology. The researchers gave scientist-participants one of two data sets and an accompanying research question: either “To what extent is the growth of nestling blue tits (Cyanistes caeruleus) influenced by competition with siblings?” or “How does grass cover influence Eucalyptus spp. seedling recruitment?”

How to make your research reproducible

Most participants who examined the blue-tit data found that sibling competition negatively affects nestling growth. But they disagreed substantially on the size of the effect.

Conclusions about how strongly grass cover affects numbers of Eucalyptus seedlings showed an even wider spread. The study’s authors averaged the effect sizes for these data and found no statistically significant relationship. Most results showed only weak negative or positive effects, but there were outliers: some participants found that grass cover strongly decreased the number of seedlings. Others concluded that it sharply improved seedling count.

The authors also simulated the peer-review process by getting another group of scientists to review the participants’ results. The peer reviewers gave poor ratings to the most extreme results in the Eucalyptus analysis but not in the blue tit one. Even after the authors excluded the analyses rated poorly by peer reviewers, the collective results still showed vast variation, says Elliot Gould, an ecological modeller at the University of Melbourne and a co-author of the study.

Right versus wrong

Despite the wide range of results, none of the answers are wrong, Fraser says. Rather, the spread reflects factors such as participants’ training and how they set sample sizes.

So, “how do you know, what is the true result?” Gould asks. Part of the solution could be asking a paper’s authors to lay out the analytical decisions that they made, and the potential caveats of those choices, Gould says.

Nosek says ecologists could also use practices common in other fields to show the breadth of potential results for a paper. For example, robustness tests, which are common in economics, require researchers to analyse their data in several ways and assess the amount of variation in the results.

But understanding how analytical variation sways results is especially difficult for ecologists because of a complication baked into their discipline. “The foundations of this field are observational,” says Nicole Nelson, an ethnographer at the University of Wisconsin–Madison. “It’s about sitting back and watching what the natural world throws at you — which is a lot of variation.”

Daily News

Reproducibility trial: 246 biologists get different results from same data sets

Replication studies’ roots

Right versus wrong

Leave a Reply Cancel reply