‘Doing good science is hard’: retraction of high-profile reproducibility study prompts soul-searching


Close-up view of an office worker checking paper documents at a desk

A retracted paper’s backstory illustrates the challenges of the technique called preregistration.Credit: Getty

The retraction of a high-profile paper1 that tested ways to improve the soundness of scientific studies has highlighted the challenges of such ‘reproducibility’ research. The retracted paper’s authors include some of the titans in the field.

In the study, published in Nature Human Behaviour last November, the authors described a rigorous research protocol involving features such as large sample sizes, with the goal of ensuring the soundness of psychological experiments. The authors applied their protocol to dozens of research projects. They reported that as a result, 86% of replication attempts confirmed the expected results — one of the highest such “replication rates” ever recorded by such studies. But the journal’s editors retracted the paper on 23 September, stating in the retraction notice2 that they “no longer have confidence in the reliability of the findings and conclusions”.

The authors agree with only one of the journal’s concerns, which the authors attribute to an innocent oversight. One of the authors, Jonathan Schooler, a psychologist at the University of California, Santa Barbara, told Nature that the group is working on a new version of the manuscript for resubmission.

Researchers following the case say it highlights the problems with an open-science tenet: preregistration, the practice of specifying, in writing, a study’s details, including the hypothesis and planned analyses, before the research is performed, in a bid to stamp out data manipulation and selective reporting of results.

“What this shows is that doing good science is hard, much harder than most people seem to think,” says Sam Schwarzkopf, a visual neuroscientist at the University of Auckland in New Zealand. “More often than not, preregistration makes people realize that their thoughtful plans don’t pan out when faced with the cold hard reality of data collection.”

Four teams and 64 replication efforts

The paper described a complex and sprawling effort: four research teams each performed preregistered pilot studies in the social-behavioural sciences. One of the studies, for example, examined whether time pressure skewed decision-making3. If the pilot study discovered an effect, the team tried to confirm the results in a sample of at least 1,500 people. All four teams attempted to replicate the experiments selected for confirmation, to see whether they would get the same results. Each team tried to replicate four of its own experiments and four from each of the three other teams.

Of the 64 replication efforts, 86% were successful — that is, they yielded the expected results, and those results were statistically significant. By contrast, other replication studies in the social-behavioural sciences have reported replication rates of 50%, on average.

The authors of the retracted study attributed their high replication rate to “rigour-enhancing practices” such as large sample sizes, preregistration and transparency about methods. Adoption of such practices could help to make studies more reliable, the authors wrote.

Shortly after the paper’s publication, Joseph Bak-Coleman, a social scientist at the University of Konstanz in Germany, and Berna Devezer, who studies marketing at the University of Idaho in Moscow, questioned its validity in a preprint4 that was uploaded to the PsyArXiv server. They noted that the authors had not preregistered some of the paper’s elements, including its central question: would the authors’ protocol increase reproducibility? Separately, Bak-Coleman sent pages of analysis to the editors of Nature Human Behaviour, who began an investigation that ultimately led to the retraction.

In a commentary5 accompanying the retraction, Bak-Coleman and Devezer wrote that “replicability was not the original outcome of interest in the project, and analyses associated with replicability were not preregistered as claimed”. The retraction notice echoed those statements. (Nature Human Behaviour is published by Springer Nature, which also publishes Nature. Nature’s news team is editorially independent of its publisher.)

An authorial admission

The day of the retraction, six of the Nature Human Behaviour authors published an account of their side of the story6. In it, they admit that some of the study’s analyses were not preregistered. But they call other statements in the retraction notice “inaccurate”, such as the journal’s finding that the authors had knowledge of the data when performing the analyses. The journal disagrees that the retraction notice contains inaccuracies.

Brian Nosek, the executive director of the Center for Open Science in Charlottesville, Virginia, and a co-author of the retracted study, says that it was shocking to find that the error in preregistration had slipped through their project-management processes. “I don’t know how many times I read that paper with these erroneous claims about everything being preregistered and missed it. It was just a screw up,” he says.

Nosek, who is considered a pioneer in preregistration, also says that, from the outset, the purpose of the project was replicability, contrary to Bak-Coleman and Devezer’s critique.

Preregistration challenges

The saga illustrates the shortcomings with preregistration, says Yoel Inbar, a psychologist at the University of Toronto in Canada. “I’ve seen a lot of preregistrations that were vague, that weren’t followed exactly, or where the final paper kind of mixed together the preregistered and non-preregistered analysis,” he says.

Inbar is increasingly convinced that a better option is the preregistration format called registered reports, in which researchers submit their study protocol, including their rationale and methods, to a journal for peer review before collecting data. Editors decide whether to accept the study on the basis of the importance of the research question and rigour of the methods, and commit to publish the results if the work is performed as described.

Others say that the journal is part of the problem. Anne Scheel, a metascientist at Utrecht University in the Netherlands, says that although the authors erred, the editors should have noticed the missing preregistration. Peer reviewers don’t always check preregistration, and big journals such as Nature Human Behaviour “need processes to actually review preregistration”, she says.

A spokesperson for the journal says it is investigating changes to its practices. “The journal is looking into ways to improve transparency, standardization and reporting requirements for preregistration in the social and behavioural sciences, which will strengthen efforts to monitor compliance with preregistration,” the spokesperson adds.

Time sink for all

Sprawling projects in which several research groups attempt the same experiments are difficult to manage, says Olavo Amaral, a reproducibility researcher at the Federal University of Rio de Janeiro in Brazil. He speaks from experience: he runs the Brazilian Reproducibility Project, an attempt to reproduce the results of scores of biomedical studies performed in laboratories in the country. “We keep finding mistakes,” he says.

He says that the criticism of the retracted paper must be addressed, but the problems do not shake his opinion of the work. “The results look pretty replicable,” he says. “I don’t think the preregistration criticism changes my mind a lot about the paper.”


Leave a Reply

Your email address will not be published. Required fields are marked *