How OpenAI’s text-to-video tool Sora could change science


An animated sequence from a video generated by OpenAI's Sora of a young man reading a book while sitting on a cloud.

Sora is one of several AI tools that generates video from text promptsCredit: OpenAI

The release of OpenAI’s Sora text-to-video AI tool last month was met with a mix of trepidation and excitement from researchers who are concerned about misuse of the technology. The California-based company showcased Sora’s ability to create photorealistic videos from a few short text prompts, with examples including clips of a woman walking down a neon-lit street in Tokyo and a dog jumping between two windowsills.

Tracy Harwood, a digital-culture specialist at De Montfort University in Leicester, UK, says she is “shocked” by the speed at which text-to-video artificial intelligence (AI) has developed. A year ago, people were laughing at an AI-produced video of the US actor Will Smith eating spaghetti. Now some researchers are worried that the technology could upend global politics in 2024.

OpenAI, which also developed ChatGPT and the text-to-image technology DALL·E, debuted Sora on 15 February, announcing that it was making the technology “available to red teamers to assess critical areas for harms or risks”. ‘Red teaming’ refers to the process of conducting simulated attacks or exploitation of a technology to see how it would cope with nefarious activity, such as the creation of misinformation and hateful content, in the real world.

Sora isn’t the first example of text-to-video technology; others include Gen-2, produced by Runway in New York City and released last year, and the Google-led Lumiere, announced in January. Harwood says she has been “underwhelmed” by some of these other offerings. “They are becoming more and more vanilla in what they present to you,” she says, adding that the programs require very specific prompts to get them to produce compelling content.

Misinformation is a major challenge for these text-to-video technologies, Harwood adds. “We’re going to very quickly reach a point in which we are swamped with a barrage of really compelling-looking information. That’s really worrying.”

Election fears

That poses particular problems with upcoming elections, including the US presidential election in November and an impending general election in the United Kingdom. “There will be colossal numbers of fake videos and fake audio circulating,” says Dominic Lees, who researches generative AI and filmmaking at the University of Reading, UK. Fake audio of the leader of the UK Labour Party, Keir Starmer, was released in October 2023, and fake audio of US President Joe Biden encouraging Democrats not to vote circulated in January.

One solution might be to require text-to-video AI to use watermarks, either in the form of a visible mark on the video, labelling it as AI, or as a telltale artificial signature in the video’s metadata, but Lees isn’t sure this will be successful. “At the moment watermarks can be removed,” he says, and the inclusion of a watermark in a video’s metadata relies on people actively researching whether a video they’ve watched is real or not. “I don’t think we can honestly ask audiences across the world to do that on every video they’re looking at,” says Lees.

There are potential benefits to the technology, too. Harwood suggests it could be used to present difficult text, such as an academic paper, in a format that is easier to understand. “One of the biggest things it could be used for is to communicate findings to a lay audience,” she says. “It can visualize pretty complex concepts.”

Another potential use might be in health care, with text-to-video AI able to talk to patients in place of a human doctor. “Some people might find it disconcerting,” says Claire Malone, a consultant science communicator in the United Kingdom. “Others might find it extremely convenient if they want to ask a medical professional questions multiple times a day.”

Data management

Text-to-video AI tools such as Sora could help researchers to wade through huge data sets, such as those produced by the European particle-physics laboratory CERN near Geneva in Switzerland and other large scientific projects, says Malone. Generative AI could “sift out code and do the mundane tasks of research”, she adds, but also do “much more sophisticated work [such as] giving it data and asking it to make predictions”.

Concerns have also been raised by people working in creative industries. The US actor Tom Hanks suggested last year that AI could enable him to continue appearing in films “from now until kingdom come” after his death. “If you were a young ambitious actor thinking about their future, and you were told ‘I’m sorry, Tom Hanks is always going to play the leading roles’, would you plan a future in that?” says Lees.

Text-to-video AI will throw up broad issues for society to face. “We’re going to have to learn to evaluate the content we see in ways we haven’t in the past,” says Harwood. “These tools put the opportunity to be a media content creator in the hands of everybody,” she says. “We’re going to be dealing with the consequences of that. It’s a fundamental shift in the way material will be consumed.”


Leave a Reply

Your email address will not be published. Required fields are marked *