In February, Google released an upgraded version of its Gemini artificial intelligence model. It quickly became a publicity disaster, as people discovered that requests for images of Vikings generated tough-looking Africans while pictures of Nazi soldiers included Asian women. Building in a demand for ethnic diversity had produced absurd inaccuracies.
Academic historians were baffled and appalled. “They obviously didn’t consult historians,” says Benjamin Breen, a historian at the University of California, Santa Cruz. “Every person who cares about the past is just like, ‘What the hell’s going on?’”
Rewriting the past to conform with contemporary political fashions is not at all what historians have in mind for artificial intelligence. Machine learning, large language models (LLMs), machine vision, and other AI tools instead offer a chance to develop a richer, more accurate view of history. AI can decipher damaged manuscripts, translate foreign languages, uncover previously unrecognized patterns, make new connections, and speed up historical research. As teaching tools, AI systems can help students grasp how people in other eras lived and thought.
Historians, Breen argues, are particularly well-suited to take advantage of AI. They’re used to working with texts, including large bodies of work not bound by copyright, and they know not to believe everything they read. “The main thing is being radically skeptical about the source text,” Breen says. When using AI, he says, “I think that’s partly why the history students I’ve worked with are from the get-go more sophisticated than random famous people I’ve seen on Twitter.” Historians scrutinize the results for errors, just as they would check the claims in a 19th century biography.
Last spring Breen created a custom version of ChatGPT to use in his medieval history class.
Writing detailed system prompts, he generated chatbots to interact with three characters living during an outbreak of bubonic plague in 1348: a traveler passing through Damascus, a disreputable apothecary in Paris, and an upstanding city councilor in Pistoia. The simulation worked like a vastly more sophisticated version of a text-based adventure game—the great-great-great-great-grandchild of the 1970s classic Oregon Trail.
Each student picked a character—say, the Parisian apothecary—and received a description of their environment, followed by a question. The apothecary looks out the window and sees a group of penitents flagellating themselves with leather straps. What does he do? The student could either choose one of a list of options or improvise a unique answer. Building on the response, the chatbot continued the narrative.
After the game, Breen assigned students to write papers in which they analyzed how accurately their simulation had depicted the historical setting. The combined exercise immersed students in medieval life while also teaching them to beware of AI hallucinations.
It was a pedagogical triumph. Students responded with remarkable creativity. One “made heroic efforts as an Italian physician named Guilbert to stop the spread of plague with perfume,” Breen writes on his Substack newsletter, while another “fled to the forest and became an itinerant hermit.” Others “became leaders of both successful and unsuccessful peasant revolts.” Students who usually sat in the back of the class looking bored threw themselves enthusiastically into the game. Engagement, Breen writes, “was unlike anything I’ve seen.”
For historical research, ChatGPT and similar LLMs can be powerful tools. They translate old texts better than specialized software like Google Translate can because, along with the language, their training data include context. As a test, Breen asked GPT-4, Bing in its creative mode, and Anthropic’s Claude to translate and summarize a passage from a 1599 book on demonology. Written primarily in “a highly erudite form of Latin,” the passage included bits of Hebrew and ancient Greek. The results were mixed but Breen found that “Claude did a remarkable job.”
He then gave Claude a big chunk of the same book and asked it to produce a chart listing types of demons, what they were believed to do, and the page numbers where they were mentioned. The chart wasn’t perfect, largely because of hard-to-read page numbers, but it was useful. Such charts, Breen writes, “are what will end up being a game changer for anyone who does research in multiple languages. It’s not about getting the AI to replace you. Instead, it’s asking the AI to act as a kind of polymathic research assistant to supply you with leads.”
LLMs can read and summarize articles. They can read old patents and explain technical diagrams. They find useful nuggets in long dull texts, identifying, say, each time a diarist traveled. “It will not get it all right, but it will do a pretty decent job of that kind of historical research, when it’s narrowly enough focused, when you give it the document to work on,” says Steven Lubar, a historian at Brown University. “That I’m finding very useful.”
Unfortunately, LLMs still can’t decipher old handwriting. They’re bad at finding sources on their own. They aren’t good at summarizing debates among historians, even when they have the relevant literature at hand. They can’t translate their impressive patent explanations into credible illustrations. When Lubar asked for a picture of the loose-leaf binder described in a 19th century patent, he got instead a briefcase opening to reveal a steampunk mechanism for writing out musical scores. “It’s a beautiful picture,” he says, “but it has nothing to do with the patent which it did such a good job of explaining.”
In short, historians still have to know what they’re doing, and they have to check the answers. “They’re tools, not machines,” says Lubar, whose research includes the history of tools. A machine runs by itself while a tool extends human capacities. “You don’t just push a button and get a result.”
Simply knowing such new tools are possible can unlock historical resources, permitting new questions and methods. Take maps. Thousands of serial maps exist, documenting the environment at regular intervals in time, and many have been digitized. They show not only topography but buildings, railways, roads, even fences. Maps of the same places can be compared over time, and in recent years historians have begun to use big data from maps.
Katherine McDonough, a historian now at Lancaster University in the United Kingdom, wrote her dissertation on road construction in 18th century France. Drawn to digital tools, she was frustrated with their inability to address her research questions. Map data came mostly from 19th and 20th century series in the U.S. and United Kingdom. Someone interested in old French maps was out of luck. McDonough wanted to find new methods that could work with a broader range of maps.
In March 2019, she joined a project at The Alan Turing Institute, the U.K.’s national center for data science and AI. Knowing that the National Library of Scotland had a huge collection of digitized maps, McDonough suggested looking at them. “What could we do with access to thousands of digitized maps?” she wondered. Collaborating with computer vision scientists, the team developed software called MapReader, which McDonough describes as “a way to ask maps questions.”
Combining maps with census data, she and her colleagues have examined the relationship between railways and class-based residential segregation. “The real power of maps is not necessarily looking at them on their own, but in being able to connect them with other historical datasets,” she says. Historians have long known that higher-class Britons lived closer to passenger train stations and farther from rail yards. With their noise and smoke, rail yards seemed like obvious nuisances whose lower-class neighbors lacked better options. Matching maps with census data on occupations and addresses showed a more subtle effect. The people who lived near rail yards were likely to work in them. They weren’t just saving on rent but decreasing their commuting times.
MapReader doesn’t require extreme geographical precision. Drawing on techniques used in biomedical imaging, it instead divides maps into squares called patches. “When historians look at maps and we want to answer questions, we want to know things like, how many times does something like a building appear on this map? I don’t need to know the exact pixel location of every single building,” says McDonough. Aside from streamlining the computation, the patchwork method encourages people to remember that “maps are just maps. They are not the landscape itself.”
That, in a nutshell, is what historians can teach us about the answers we get from AI. Even the best responses have their limits. “Historians know how to deal with uncertainty,” says McDonough. “We know that most of the past is not there anymore.”
Everyday images are scarce before photography. Journalism doesn’t exist before printing. Lives go unrecorded on paper, business records get shredded, courthouses burn down, books get lost. Conquerors destroy the chronicles of the conquered. Natural disasters strike. But tantalizing traces remain. AI tools can help recover new pieces of the lost past—including a treasure trove of ancient writing.
When Mount Vesuvius erupted in 79 C.E., it buried the seaside resort of Herculaneum, near modern-day Naples and the larger ancient city of Pompeii. Rediscovered in the 18th century, the town’s wonders include a magnificent villa thought to be owned by the father-in-law of Julius Caesar. There, early excavators found more than 1,000 papyrus scrolls—the largest such collection surviving from the classical world. Archaeologists think thousands more may remain in still-buried portions of the villa. “If those texts are discovered, and if even a small fraction can still be read,” writes historian Garrett Ryan, “they will transform our knowledge of classical life and literature on a scale not seen since the Renaissance.”
Unfortunately, the Herculaneum scrolls were carbonized by the volcanic heat, and many were damaged in early attempts to read them. Only about 600 of the initial discoveries remain intact, looking like lumps of charcoal or burnt logs. In February, one of the scrolls, a work unseen for nearly 2,000 years, began to be read.
That milestone represented the triumph of machine learning, computer vision, international collaboration, and the age-old lure of riches and glory. The quest started in 2015, when researchers led by Brent Seales at the University of Kentucky figured out how to use X-ray tomography and computer vision to virtually “unwrap” an ancient scroll. The technique created computer images of what the pages would look like. But distinguishing letters from parchment and dirt required more advances.
In March 2023, Seales, along with startup investors Nat Friedman and Daniel Gross, announced the Vesuvius Challenge, offering big money prizes for critical steps toward reading the Herculaneum scrolls. A magnet for international talent, the challenge succeeded almost immediately. By the end of the year, the team of students Youssef Nader, Luke Farritor, and Julian Schilliger had deciphered more than enough of the first scroll—about 2,000 characters—to claim the grand prize of $700,000. “We couldn’t have done this without the tech guys,” an excited Richard Janko, a professor of classical studies at the University of Michigan, told The Wall Street Journal.
Although only about 5 percent of the text has so far been read, it’s enough for scholars to identify the scroll’s perspective and subject. “Epicureanism says hi, with a text full of music, food, senses, and pleasure!” exulted Federica Nicolardi, a papyrologist at the University of Naples Federico II. This year the project promises a prize of $100,000 to the first team to decipher 90 percent of four different scrolls. Reclaiming the lost scrolls of Herculaneum is the most dramatic example of how AI—the technology of the future—promises to enhance our understanding of the past.