Urban digital twins are high-fidelity computational models of cities (‘replicas’), representing most of their functional units in explicit ways over 3D space and time1,2,3,4. Present enthusiasm for urban digital twins is the direct result of growth in computational power, visualization, and the ability to rapidly acquire and represent large datasets on most elements of cities at high resolution4. The growing importance and complexity of cities in human societies5 is another motivating factor, especially when expressed in technological terms, such as via concepts of smart cities6,7.

While recent developments are exciting for their level of detail and verisimilitude, they are not fundamentally new. Each era has employed its best data and modeling capabilities to map and attempt to better understand their cities5. Current digital twins introduce new elements to this development, especially in terms of much higher spatial resolution, real-time data, and bi-directionality between computational models and the real city8. Here, I briefly summarize where we stand and what has been achieved recently, but also what major problems remain. The discussion is guided by concepts of computational science and how they connect to possibilities and limitations emerging from urban science and applications to urban planning.

Assembling the digital city

A useful metaphor for an urban digital twin is the representation of a complex game, like chess. There are several aspects to this. First, we must represent every unit (piece) and its position within the space that defines its dynamics (board): this is the easy part. Then, we must be able to play the game well, which is much more difficult. To do so, we must understand the game in order to generate and evaluate sequences of moves in terms of some measure of value (an objective function) and choose between strategies (sets of moves) that maximize it. As we play, we must also update the board’s configuration with observations of the moves from other players. Present urban digital twins can create exquisite representations of a city (pieces and board), including some of their observed dynamics over time, but they are still relatively primitive tools for strategic planning. Some of these difficulties are practical, while others are conceptual. In both cases, the size, complexity, and open-ended character of cities presents fundamental challenges to computational science, and provides a proving ground for its value in society.

To better appreciate why, I start with a brief overview of the present explosion in urban data, which because of its size, scope, and heterogeneity is a new feature of urban digital twins. Recently, there has been tremendous development of precise digital maps of cities resulting from progress in computational geographic information systems (GIS), (submeter) aerial and remote sensing (multispectral and LiDAR), crowdsourced platforms (principally OpenStreetMap), and mobile devices. This allows the detailed representation of each street, public space, and every building. Levels of detail for each of these elements continue to improve so that, for example, we now have digital representations of building shapes, external textures, materials, and internal spaces. Each of these elements is also increasingly tagged by an address and functional (land use) attributes such as being residential, associated with a specific household, real estate unit, and its value and description. Or it can be a business, characterized by its goods and services, jobs, hours, customer ratings, real-time occupancy, financial performance, and so forth. Likewise, infrastructure elements such as streets and roads are tagged with long lists of attributes from street names and types to digital street views, walkability scores, and current levels of activity or congestion. Ambient physical attributes such as local temperature, humidity, measures of air quality, and noise are also increasingly well characterized via progress in distributed sensing4. Other infrastructure, such as electrical cables, water and sanitation pipes, subway lines, power stations and transformers, underground spaces and so forth are also being mapped in detail and characterized by the status of their functions, though such information is often considered critical infrastructure and remains outside the public domain.

Besides infrastructure and physical spaces, each tree can now be precisely located and tagged with information about species, age, and size, and their health can be tracked through observation of their greenness by multispectral imaging. Cars and other vehicles are identified and located as they move through cities using mobile devices and traffic cameras. Likewise, mobile devices, wireless infrastructure, surveillance cameras, digital records from purchases and public services, building access readers and more, locate individual people and characterize their behavior, socioeconomic activity, and even some aspects of their mood, for instance via face recognition. This pervasive power of observation raises concerns about surveillance and privacy, which are presently the focus of intense discussions in various cities and nations2,4.

Urban digital twins have recently been developed to organize and visualize all this information, increasingly in real time1,2,3. Achieving the massive levels of data storage, fusion, and representation necessary is a major achievement in computational science, with still many outstanding challenges3,4. Progress has relied on new and improved methods of distributed computation in myriads of devices, data protocols and standardization, data fusion and storage, GIS computation, object identification and classification in images and video (increasingly via machine learning), and (Bayesian) real-time data assimilation and visualization, among other techniques. Rapid progress in all these areas is likely to continue, placing urban computing at the forefront of applied computational science1,2,4.


Credit: Busakorn Pongparnit / Moment / Getty Images

What then are the applications of this astonishing power? The overall achievement so far is to provide stakeholders with an unprecedented level of ‘situational awareness’ within extremely complex and heterogeneous urban environments. This enhanced capacity is becoming important for detailed record-keeping, public accountability, maintenance, and real-time assessment and control of recurrent urban services. Thus, urban digital twins are especially compelling to city governments and their various agencies. They are also an increasingly good business for software developers and digital technology corporations.

The city in motion

These practical achievements notwithstanding, there is a growing tension between the advent of better and larger urban digital twins driven by data and computation and our fundamental understanding of cities5. A litmus question for digital twins is whether they are ‘fit for their purpose’. Defining such objectives clarifies necessary ‘levels of fidelity’ for computational models, and associated effort and cost8. This is especially difficult to answer for cities, where there are many potential purposes, from mundane tasks to far-reaching social aspirations.

In this light we can ask: are digital twins merely detailed representations of cities? Or do they provide good, predictive explanations of how cities work, now and into the future? These may seem philosophical questions, but they entail if, how, and when urban digital twins should be used in practice, thus defining their value.

My current estimation is that urban digital twins remain, at this point, shallow explanations of urban processes despite their great level of representational detail and growing data assimilation power. They may, nevertheless, serve as excellent aides for decision support, especially in adaptive (sequential) processes of planning and assessment, helping inform and coordinate many urban stakeholders2.

To understand these limitations, observe that while digital twins disaggregate, cities aggregate. More explicitly, while the construction of urban digital models has followed a strategy of increasing detail and fidelity, cities cannot be understood except in light of some statistical aggregation. This is because their most important properties are emergent, arising from many interactions over time, which are extremely difficult to compute from the bottom up.

We can illustrate this point more directly through the concept of computational complexity6. The enormous computational complexity of urban environments applies to all stakeholders — people, businesses, government — making optimal decisions and planning provably impossible: I call this the ‘planner’s problem’, echoing a similar concept in economics9.

Let us suppose that we do create a dynamical model of cities by prescribing a space of possible strategies (choices), over space, time, and social relations, for each urban agent. For a person, this is easy to imagine: you wake up at home, have breakfast, take the kids to school, go to work, go to the grocery store, pick up the kids, go home, eat, sleep; repeat with variation. This is, of course, too simple: myriads of other events can take place, including other activities, meetings with many other people, and more details pertaining to each step. Thus, even if we could set up a schedule over a set of choices for each agent, the size of the space of possibilities for the entire city (with millions of people) over time would grow extremely fast, typically faster than exponentially. This defines the computational complexity of simulating cities, which I estimated explicitly elsewhere6. It has several undesirable consequences for simulation and planning: any actual set of simulations becomes a vanishing, unrepresentative fraction of all possibilities as time goes by. Consequently, without additional constraints or averaging, it becomes impossible to search the space of possibilities to identify best choices associated with desirable urban trajectories. This feature of all complex systems is at the heart of why their dynamics must remain evolutionary (in a broad sense), and why strict optimization is impossible: ‘premature optimization’ (typical of traditional urban planning, but also pervasive in engineering) here too ‘is the root of all evil’.

Nevertheless, detailed computation models of cities when applied in focused ways have a range of compelling applications, which have become better understood over the last couple of decades. The first high-fidelity computational models of cities were developed for transportation studies. While early precursors dealt with aggregate flows of people and vehicles between coarse ‘origin and destination’ areas, around the mid-1990s models like TRANSIMS10 took the bold step of representing each person and each residential and work location, along with likely movements between these places displayed in rich visualizations. Such computational models required large supercomputers and ran more slowly than the city itself. Because input data were sparse, these models depended exclusively on forward simulation to make their predictions. Data assimilation in real time was all but impossible, so approximations (and errors) relative to real-world events could not be corrected during simulation and would typically amplify over time. Despite these limitations, large agent-based simulations became so much more compelling than earlier aggregated models that they were quickly adopted by several cities at great effort and cost, as each was one-of-a-kind.

These models quickly found applications in urban planning10 but also in other fields such as public health11,12 and emergency response13. An important new application was the development of detailed and spatially explicit agent-based models for the spread of contagious diseases such as influenza and, more recently, the COVID-19 pandemic11,12,14. Because of their spatiotemporal specificity, representing the trajectory and contacts of every person, such models could explore very local scenarios and population heterogeneities, related to original case introductions and spread through social networks, informing social distancing strategies and attribution. They could also evaluate the merits of targeted interventions such as school closures, which quickly became policy options. However, empirical tests resulting from real epidemic outbreaks, such as of H1N1 influenza in 2009, and the COVID-19 pandemic, cast some doubt on the advantages of very detailed simulations, at least when compared to aggregate population models and real-time data assimilation methods14,15. This is because large agent-based models are very brittle: they require the specification of so many parameters that they are hard to adapt to new situations when behaviors and policy change quickly, such as during a novel epidemic outbreak14.

Other important uses of early urban digital twins were critical infrastructure protection from natural disasters or terrorism, and to plan emergency evacuations, which considered detailed transportation models and some behavioral choices under exceptional circumstances13. Today, similar approaches aid the preparation of coordinated emergency response to extreme events, including those linked to climate change. Other applications deal with evaluating proposed built environment designs such as new buildings or new public spaces in terms of their 3D appearance, but also their environmental impacts such as shading, air flow, traffic, and pedestrian mobility.

Modeling future cities

What all these applications have in common is a relatively short time horizon, the absence of significant behavioral change, and relatively well posed objectives. Such conditions are not typical of cities: they are untenable assumptions over long times, or when considerable changes in behavior, knowledge, or technology are at play. They are also fundamentally different to traditional urban planning, which targets time horizons of years, or even decades.

This brings us to our final and most challenging point. Are urban digital twins useful in general circumstances for urban planning and policy? Do they predict observable emergent properties of cities — such as patterns of socio-economic development — not only in the present but also in the future, as other factors change?

At present, these issues remain research questions. Digital twins are not alone as urban models in wide use that struggle to demonstrate fundamental properties of cities. Older models, driven by practical considerations and planning needs, such as for transportation or regional economic development, tend to fare relatively poorly compared to observations and to each other. This has led to a re-evaluation of early normative assumptions underlying models of urban planning and to the development of a new generation of approaches based on complex systems, networks, more sophisticated statistics, and comparative analyses across different scales of organization and behavior in cities.

These approaches have helped develop better explanatory models of cities and elucidate when their apparent complexity leads to predictive patterns. For example, it should be evident that people and organizations thrive in cities precisely because these environments are predicable, at least in a statistical sense5. Cities provide many types of predictable resources and opportunities to people, for example in terms of the availability of food, water, and other essential goods, as well as jobs and social services. Businesses thrive in cities — are incubated, innovate, and specialize — because of predictable demand and supply. Likewise, local governments depend on a predictable need for urban services, law enforcement and social protections, and revenue from utilities and real estate taxes. Observe the kind of predictability involved: it is statistical and systemic; it does not necessarily involve particular people or businesses, only the emergent properties of aggregate urban activities5.

Recent developments in urban science formalize these insights into statistical models and theory, capturing and predicting many interconnected, and sometimes surprising, properties of cities5. These approaches are anchored on essential (general) features of cities regardless of context such as their population size, the extent of their built environments, mobility costs, and rates of socioeconomic interaction. Consequences of integrating these social and infrastructural systems include quantitative (statistical) predictions for rates of innovation, economic productivity, extent and types of infrastructure, patterns of the division of labor, rates of social interaction, aspects of mental health and behavior, and the speed of contagious disease spread. The proximate pathways for these predictions are the properties of the interaction structures between urban agents, which show emergent collective effects in terms of so-called network (agglomeration) effects in socioeconomic rates, and economies of scale in material infrastructure5. Processes that change the knowledge base of a city, such as creativity, migration, and technological change are harder to predict in detail, but are also increasingly being understood statistically as endogenous properties of cities as open, heterogeneous, and selective socioeconomic networks.

I see no fundamental reason for why these general processes, and more still being discovered, cannot be represented in detailed digital models of cities.

One aspect of the problem is the creation of the appropriate multi-scale algorithms to (dis)aggregate evidence towards understanding and predicting the decisions of many different stakeholders. A necessary requirement is the faster and more complete construction of heterogeneous networks of human interaction from observed or modeled mobility, visitation patterns, and social behavior. Such networks are graphs encoding (time-averaged) relational structures, mediating detailed individual behavior and the outcomes of health, social, economic, and political processes at larger scales5. They summarize conserved (or slow changing) patterns of social and infrastructural organization and contain statistical predictive power for many important phenomena such as epidemic spread, mobility patterns, mental health, innovation, and various forms of economic activity. They are also a means to track relevant changes in behavior and culture by comparing graphs over time and under different circumstances. The encoding and decoding of aggregate behavior in terms of graphs also introduces many well-developed and efficient algorithms to the study of complex physical and social phenomena. This convergence of theory and methods should eventually lead to improved agent-based models following developments, for instance, in reinforcement learning and intertemporal choice optimization, which can inform more strategic change.

Another dimension of this problem is the characterization of built environments from a human perspective at scale. Some of the computational infrastructure for achieving this human-centric view of each city has developed quickly in the context of games set in urban environments and via street-view imaging and classification. This synthesis will require a variety of new methods, including data fusion of GIS and official records data, but also machine learning classification of images supervised by human experiences and uses. These developments will become particularly important in thousands of developing cities worldwide, which lack legacy official statistics and GIS infrastructure.

Another important aspect of this problem deals with methods for verification, validation, and uncertainty quantification (VVUQ)8. This is especially important because streaming data is now a crucial input to non-linear predictive models and decision support systems. VVUQ is challenging in cities because most computational models are designed for (some sort of) average prediction. By contrast, great cities are predicated on diversity and variation along many dimensions, including individual people, businesses, and physical places. This diversity is well understood to be functional — a necessary source of innovation, creativity, and resilience — but it also has a disruptive ambient presence that is often perceived as inefficient, or even dysfunctional. This tension has statistical implications and, specifically, blurs the line between anomaly detection (and elimination) and functional surprise. This raises a warning against the use of urban digital twins for homogenizing urban environments in terms of human behavior or physical spaces, which would, in effect, kill a city. In this sense, the importance of dispersed, tacit information and its amplification and aggregation for socioeconomic coordination and innovation must become more pervasive features of computational models appropriate for modeling cities. Meeting these challenges is not a purely technical issue: it requires a cosmopolitan research stance (pun intended) informed by many different disciplines, and an openness to the natural diversity, dynamism, and messiness of cities.

Such integration defines, in my view, the essence of future challenges to large-scale models of cities, bridging the divide between scientific research and practice in ways that can more quickly advance both. This interface and its latent potential for scientific discovery and transformative practice defines a new frontier for computational science where urban digital twins are required to go beyond faithful representation and increasingly test and embody the generative processes that create and sustain cities, and that will define the future of human societies.