Introduction

Protected areas (PAs) are a core management response to the increasingly pervasive and multiple impacts of global change on species1,2 and human well-being3,4, although weak regulations4, financial limitations5 and conflicts6 can hinder their conservation benefits. To face the rapid expansion of human activities, even towards the last wilderness areas7, and the unprecedented rate of biodiversity loss8,9, the global coverage of PAs has markedly increased over the last decades, with more than 17% of the world’s terrestrial surface and 8% of the ocean formally designated as protected10. Recently, the 15th Conference of Parties to the UN Convention on Biological Diversity (COP 15) adopted the “Kunming-Montreal Global Biodiversity Framework” (GBF) committed to protect at least 30% of both land and sea before 2030. Yet, the current protection coverage is highly heterogeneous on Earth10, and some advocate that more ambitious and relevant targets are urgently needed to safeguard biodiversity and nature’s contribution to people1,2.

This race to protect more land and sea does not necessarily benefit biodiversity and people equally everywhere11,12, nor are the costs arising from PAs shared equitably around the world13,14 with identified barriers and levers to increase protection coverage15,16. Global assessments of PA coverage10 may thus overshadow socioeconomic, habitat and environmental gaps, which can be detrimental to poverty alleviation17, climate mitigation18 and species conservation19. Indeed, PA designations tend to be biased toward higher-income countries20 and places that are remote or of low value to agriculture21 and fisheries22. Yet, while most research has focused on evaluating the potential benefits of existing and future PA networks for nature and people2,23, less attention has been given to a better understanding of the combination of socioeconomic and environmental enabling conditions to PA establishment. Indeed, whilst the influence of some factors like geopolitical conflicts24, human development20, land use25, cultural context26 and elevation27 on PA establishment have been studied, other potential barriers and levers to increase protection coverage have been overlooked, like the level of dependency on natural resources and the number of environmental Non-Governmental Organisations (NGOs). In other words, identifying the main socioeconomic and environmental preconditions of PA establishment worldwide should provide critical insights into the potential viability of efforts for new PA designations to achieve the ambitious target of at least 30% coverage in the short term.

Here, we quantitatively assess the ‘global niche’ of terrestrial and marine PAs by representing the distribution of all IUCN-recognised PAs in a multidimensional social-environmental space encompassing local and national conditions15. By analogy to species niches that define their suitable habitats, so determine their extinction and colonisation patterns28, PA niches are expected to predict their new establishment patterns but also downgrading, downsizing, and degazettement events29. We then build machine learning models predicting the relative likelihood of PA occurrence and identifying PA-enabling conditions on both land and sea. Finally, model predictions are used to pinpoint non-protected areas that are ‘potential gains’ and ‘unrealistic gains’ for biodiversity conservation globally, i.e., areas that would benefit most to vertebrates and that are respectively favourable and unfavourable to PA establishment. Ultimately, a better understanding of local opportunities, levers and barriers would guide strategic interventions towards realistic and operational long-term positive conservation outcomes.

In this work, we show that PA distribution is highly heterogeneous on Earth, even within a single country, but highly predictable from a set of social-environmental factors. We highlight a few clustered areas meeting favourable PA enabling conditions while being critical for biodiversity conservation, i.e. ‘low-hanging fruit’ areas, but widespread unrealistic high conservation gains, so ‘high-hanging fruit’ areas.

Results and discussion

Underrepresented socioeconomic and environmental conditions

To conduct our assessment, we first defined a global 10 × 10 km grid on both land and sea. We then selected every grid cell that overlapped, even partially, with a PA of the IUCN category from I to VI (see ‘Methods’). We also randomly selected the same number of grid cells outside PAs to obtain a representative set of non-protected areas globally and a balanced design in our statistical analyses (see ‘Methods’). All marine cells were constrained to be included within countries’ exclusive economic zones (EEZs), where 98.8% of marine PAs are located according to the Marine Protection Atlas, and to account for socioeconomic factors at the local or country level. We thus ended up with a total of 517,450 cells on land and 421,746 cells at sea, which are representative of all biomes on Earth.

For each cell, we extracted the value of 14 factors for both land and sea (see ‘Methods’ and Supplementary Data 1) to provide a spatially explicit analytical framework of local social–environmental conditions15. These factors were carefully selected in relation to their potential association with the presence of protected areas (see rationales and hypotheses in Tables 1 and 2) and their relative independence (Pearson correlation between −0.7 and 0.7). Seven factors were related to the socioeconomic context, which can be local (Gross Domestic Product or GDP, Accessibility in terms of travel time to the nearest city and Human footprint) or national (human development index or HDI, number of conflicts, number of Non-Governmental Organisations or NGOs and Dependency on natural resources). Seven other factors were related to the local environmental context (island, atmospheric or sea surface temperature, primary productivity, elevation or depth, distance to the coast, along with distance to seamounts and salinity for marine cells and precipitation and freshwater coverage for terrestrial cells). In the rare cases where values for these factors were not available (i.e., 0.7% of cells on land and 1.2% at sea), we inferred the missing values using a machine learning algorithm (see ‘Methods’).

Table 1 Main rationale and hypotheses explaining the expected relationships between the seven environmental and habitat factors used in our study and the presence of protected areas on land and sea
Table 2 Main rationale and hypotheses explaining the expected relationships between the seven socioeconomic factors used in our study and the presence of protected areas on land and sea

To visualise the global grid cell distribution according to the level of protection in a socioeconomic and environmental multidimensional space, we first performed an ecological niche factor analysis (ENFA), a linear ordination technique30. Considering all IUCN PA categories, we show that while the social-environmental space of unprotected areas broadly overlaps with that of protected areas for both land and sea, there are consistent gaps (Fig. 1). In other words, the global network of PAs fails to cover all socioeconomic and environmental conditions on Earth. Moreover, the density of PAs along the first ENFA axis is markedly different from that of unprotected areas on both land (Kolmogorov–Smirnov distance = 0.32 vs. 0.07 along the second axis) and sea (Kolmogorov–Smirnov distance = 0.52 vs. 0.24 along the second axis). Terrestrial and marine PAs tend to aggregate on positive values along the first ENFA axis, so where the level of HDI and the number of NGOs are high but where the number of conflicts, human footprint and dependency on natural resources are low (Fig. 1) like the Cap Corse and Agriate marine protected area (France, IUCN category V) located in Corsica, a Mediterranean island (Fig. 2b). Yet, more restrictive protected areas are located in under-represented social-environmental contexts like the Viñales valley (Cuba) which is of IUCN Category II (Fig. 2a).

Fig. 1: Global distribution of protected areas (PAs) and unprotected areas in a social-environmental space.
figure 1

This 2-dimensional space was built with 14 local and national social-environmental factors (Tables 1 and 2) on land and sea using an ecological niche factor analysis (ENFA) (see ‘Methods’). Most restrictive PAs (IUCN category I only) are represented in darker colours, while PAs from all IUCN categories (IUCN categories I–VI) are represented in lighter colours. Thin lines in the central panels denote the extent of the niche for each cell category. The influence of social-environmental factors on the construction of the niche spaces is represented by bar plots in the lower panels (orange bars for socioeconomic factors and green bars for environmental factors). Marginal distributions of each grid cell category along the two first axes are represented with density plots. Source data and codes are provided at https://doi.org/10.5281/zenodo.11183846.

Fig. 2: Examples of protected areas and vertebrate species included in our study.
figure 2

The tropical Viñales Valley (Cuba) is an IUCN protected area Category II (a). The valley is encircled by rocky mountains with the presence of traditional agricultural production, particularly of tobacco (photo credit Simon Berger). Alpine ibex (Capra ibex) in the Ristolas—Mont-Viso (France) an IUCN protected area category IV (b) (photo credit Wilfried Thuiller). The Cap Corse and Agriate marine protected area (France) is an IUCN category V (c). The Corsica Island is relatively remote in the overcrowded Mediterranean Sea (photo credit David Mouillot). Yellowtail kingfish (Seriola lalandi) and Galapagos sharks (Carcharhinus galapagensis) in the Rapa Nui marine park (Chile), which is an IUCN Category IV protected area (d) (Photo Credit Manu San Felix).

IUCN category I PAs (i.e., the most restrictive, or most strictly protected) only cover a reduced fraction of global social-environmental conditions, particularly at sea (Fig. 1). These restrictive PAs tend to disproportionately occur on positive coordinates along the first ENFA axis compared to non-protected areas (Kolmogorov–Smirnov distance = 0.50 on land and 0.45 at sea vs. 0.22 on land and 0.29 at sea along the second axis). This clustering of PAs and gaps of unprotected conditions are even more pronounced in the socioeconomic multidimensional space (Supplementary Fig. 1) than in the environmental space (Supplementary Fig. 2).

Our findings echo previous studies suggesting that PA distribution is highly heterogeneous on Earth10, even within a single country31. Indeed, PA design has historically rarely been systematic or strategic, but rather opportunistic or political, leaving many habitats, environments, species and evolutionary lineages underrepresented19,32. For instance, a positive bias towards high elevation areas27, unfertile lands21 and islands33 have been shown for terrestrial PAs. The lack of national or international coordination has also induced a contrast in protection coverage between areas with different economic levels or population densities33,34. Here, we reveal that the global heterogeneous distribution in PAs is even more pronounced across social and economic contexts than across the range of environmental conditions, with many gaps for the most restrictive PAs (IUCN category I). These gaps are highly detrimental for both nature, since most biodiversity hotspots occur in the tropics35,36, and people, given the well-being3,4 and socioeconomic17,37 outcomes provided by PAs which are currently underrepresented in most African countries10 where such management tools and potential benefits are the most needed38.

Prediction of areas under protection

Since we randomly chose unprotected areas, i.e., cells outside the global network of IUCN-recorded PAs, and some PAs are not reported39, we only have presence-background data40. Therefore, we do not predict probabilities of PA occurrences but instead, relative likelihoods of PA occurrences in each grid cell using the same set of 14 social-environmental factors and a machine learning algorithm (random forest or RF) accounting for covariation across factors, potential nonlinearities, and spatial hierarchy of factors (e.g., cells within countries). To avoid underestimating the model predictive error related to the spatial autocorrelation in data41, particularly due to the inherent contiguity of protected cells in large PAs, we performed a 10-fold spatial cross-validation procedure by creating spatially independent training and testing folds based on a minimum distance of 1000 km42 (see ‘Methods’). We also performed sensitivity analyses where we removed the largest PAs but also those belonging to the lowest and highest HDI countries to test the robustness of our findings (see ‘Methods’). We did not detect any significant spatial autocorrelation in the residuals of the models.

The relative likelihood of PA occurrence, irrespective of the IUCN category, is accurately predicted on land (spatially cross-validated accuracy = 0.74 ± SD = 0.02 and true skill statistic TSS = 0.47 ± SD = 0.04), with socioeconomic factors having on average more relative standardised importance (0.63) than environmental factors (0.59) (Fig. 3a, Supplementary Fig. 3). The relative likelihood of marine PA occurrence is even more accurately predicted (spatially cross-validated accuracy = 0.88 ± SD = 0.05 and TSS = 0.76 ± SD = 0.09) with socioeconomic factors being on average even more important (0.66) than environmental factors (0.43) (Fig. 3b, Supplementary Fig. 4). For both land and sea, HDI, accessibility, the number of NGOs, the number of conflicts, dependency on natural resources, primary productivity and annual temperature are the most important factors correlated to relative PA occurrences. On average, the relative standardised importance of national socioeconomic factors is higher than that of local socioeconomic factors on both land (0.81 vs. 0.47) and sea (0.69 vs. 0.56), explaining the high heterogeneity of current protection coverage among countries10.

Fig. 3: Importance of socioeconomic and environmental factors predicting the relative likelihood of protected area occurrence on land and at sea.
figure 3

Circular histograms show the relative importance (standardised to 0–1 based on permutations, see Methods) of each factor in explaining the relative likelihood of protected area occurrence (all IUCN categories vs. IUCN I only) on land (a, c) and sea (b, d) using random forest models. Socioeconomic factors are in orange, while environmental factors are in green. Their average relative importance is represented by a bar plot in the middle of each circular plot. See Supplementary Figs. 3 and 4 for error bars across the 10-fold spatial cross-validation procedure and Supplementary Data 1 for factor abbreviations and descriptions. Source data and codes are provided at https://doi.org/10.5281/zenodo.11183846.

When applying the same analysis to the most restrictive PAs (IUCN category I), RF models reach an even higher predictive performance both on land (spatially cross-validated accuracy = 0.91 ± SD = 0.02 and TSS = 0.46 ± SD = 0.06) and at sea (spatially cross-validated accuracy = 0.94 ± SD = 0.03 and TSS = 0.64 ± SD = 0.21), with again a dominance of socioeconomic factors over environmental ones and of national factors over local ones (Fig. 3c, d, Supplementary Figs. 3 and 4). These models show that, even in the complete absence of biodiversity or land use information, the presence of highly restrictive protected areas both on land and at sea is highly predictable from a small set of well-selected social-environmental factors. Among these factors, the predominant ones (importance > 0.8) for land are HDI and temperature and at sea are accessibility in terms of travel time and temperature. The sensitivity analyses show that the models are very robust to the removal of the largest PAs and those belonging to the lowest and highest HDI countries (Supplementary Fig. 5), suggesting that the inherent omission of PAs in IUCN-reported lists and the overrepresentation of protected grid cells belonging to the largest PAs have little impact on our results and conclusions.

Partial response plots reveal remarkable consistencies in the factors associated with the relative likelihood of PA occurrence both on land and at sea (Fig. 4, Supplementary Fig. 6). Indeed, the relative likelihood of having protection coverage (whatever the IUCN category) in a given area is positively linked to the level of HDI and the number of NGOs, but negatively to the number of conflicts, the distance to the coast, the dependency on natural resources, human footprint and annual temperature. The positive link with HDI was already demonstrated in coastal areas20 but also holds for terrestrial areas. Yet, this relationship is non-linear, with a positive contribution of HDI on the relative PA occurrence only apparent above a minimum threshold of 0.7, corresponding to the developed countries43. It suggests that, given the small increase in HDI for the poorest countries in recent decades44, development alone cannot offer a short-term solution to increase PA coverage. In contrast, for medium-HDI countries, we can expect a rapid rise in protection coverage even for a small increase in HDI, especially at sea. This projection is only valid under a space-for-time assumption since our models do not include any temporal trend but only spatial information. Still, these results echo the call to engage developing countries in strategies skipping the intermediate stage of overexploitation during the growing phase to rapidly reach the developed state (HDI > 0.7) when the Living Planet Index, reflecting the state of vertebrate species, begins to recover with development43. Below this threshold of 0.7, any increase in HDI may be detrimental to biodiversity since it may increase the overexploitation of natural resources43.

Fig. 4: Partial response plots between the relative likelihood of PA occurrence on land and sea and the nine most important socioeconomic and environmental factors.
figure 4

We used modelled partial relationships between the relative likelihood of protected area presence on land (dark and light orange lines) and sea (dark and light blue lines) and the most important factors detected by the random forest models (ai), while controlling for the other factors, e.g., kept constant at their means. The plain lines are for all IUCN PA categories, while the dashed lines are for the most restrictive IUCN I PA category only. Other factors are presented in Supplementary Fig. 6. Source data and codes are provided at https://doi.org/10.5281/zenodo.11183846.

We show that beyond countries’ HDI, the socioeconomic context is a key precondition of PA establishment, with the presence of conflicts and high human footprint both acting as potential barriers to achieving conservation targets. This socioeconomic context is not only relevant at the country level to explain PA establishment, as shown by other studies33, but also at the local scale (10 km resolution) with accessibility being the most important factor in models predicting the relative likelihood of marine PA occurrence (Fig. 3b, d). By contrast, the link with the number of NGOs is positive for both land and sea, suggesting a potential lever to PA establishment, owing to enhanced financial and management capacity15, with a marked threshold of 50 NGOs at the country scale for boosting marine PA establishment. However, NGOs may also be attracted to areas with a high-density of PAs to facilitate fundraising and help reach positive outcomes, so we rather highlight correlation instead of causation here.

There are some marked differences between land and sea when looking at the secondary factors explaining the relative likelihood of PA occurrences. Elevation was positively related to terrestrial PA relative occurrence (e.g., lots of PAs in Europe are located in the mountains)27, while water depth was negatively correlated to the relative occurrence of marine PAs, highlighting a global positive bias of protection coverage on the coast and a pelagic deficit within countries’ EEZs2. Yet, remote areas, characterised by their low accessibility, are more likely to be protected than accessible coastal areas, a known bias towards residual marine PAs45, while accessibility has no visible effect on land.

When focusing on the most restrictive IUCN category I PAs, partial responses show similar relationships (Fig. 4, Supplementary Fig. 6). The number of conflicts, the number of NGOs, and dependence on natural resources are the strongest correlates of PA relative occurrence, albeit to a lower extent than when all IUCN categories are considered. However, HDI is no longer important on sea, but is still a strong factor on land, suggesting an overrepresentation of restrictive terrestrial PAs in developed countries. At sea, accessibility and dependency on natural resources have the strongest negative relation with relative IUCN PA occurrence, illustrating strong barriers to PA establishment along coastlines where human density is high and marine resources are essential for livelihood or food security. In contrast, accessibility and human footprint have no negative relationship with IUCN I PA occurrence on land, suggesting fewer or weaker barriers due to human pressure in restrictive terrestrial PA establishment. As an alternative explanation, old restrictive terrestrial PAs, like US national parks, were initially created in remote areas that have experienced substantial increases in housing density since 194046. Therefore, even in populated areas (e.g., agricultural plains), IUCN category I PAs appear to be viable management options if the socioeconomic context is favourable (high number of NGOs and HDI but also low dependency on natural resources and number of conflicts), suggesting that highly restrictive terrestrial PAs are not fatally constrained to ‘high and far’ areas21. Yet, this expansion may be compromised by projected land use and parochialism25, since large-scale conservation actions can be locally constrained by political and stakeholder units.

Potential and unrealistic conservation gains

Taking advantage of our accurate predictive models, we can assess the relative likelihood that unprotected areas worldwide could become protected based on their social-environmental context. In doing so, we aim to document whether and where the enabling social-environmental conditions pre-exist for PA implementation outside of the existing global PA network. Since national socioeconomic factors explain the relative likelihood that unprotected areas are turned into IUCN I PAs (e.g. HDI), we estimated and mapped this averaged likelihood per country on both land and sea (Fig. 5). We show that only some European countries and New Zealand reach a mean probability of protecting current terrestrial unprotected areas higher than 50% while this value is lower than 10% for most countries except in North America and some other countries like Australia, Botswana, Japan or Mongolia. For marine unprotected areas, this mean probability of establishing IUCN I protection is very low (<10%) in most countries except Chile, Mexico and some others, suggesting that reaching more than 10% coverage by the most restrictive marine protected areas, beyond the current effort, is unrealistic for most countries so the high sea may represent the next big challenge to protect more than 30% of marine areas before 2030 including 10% under full protection2.

Fig. 5: Mean probability of protecting currently unprotected areas per country.
figure 5

Random Forest models estimate the average relative likelihood per country that unprotected areas are turned into IUCN-I protected areas given the social-environmental context on both land (a) and sea (b). Source data and codes are provided at https://doi.org/10.5281/zenodo.11183846.

To determine whether areas meeting favourable enabling conditions, i.e., ‘low-hanging fruit’ areas, match the most critical areas for biodiversity conservation, we used the global distributions of terrestrial mammals and birds but also marine fishes (see ‘Methods’) to rank the non-protected areas according to their conservation benefits based on the coverage of species range size, if turned into protected areas. We collected, for instance, the global distribution of the Alpine ibex (Capra ibex) and the Galapagos shark (Carcharhinus galapagensis) (Fig. 2c, d). We considered here only PAs of IUCN category I since these terrestrial and marine vertebrate taxa tend to benefit only from highly restrictive protection measures47,48,49. More precisely, we implemented a complementarity-, balanced-, and priority-based ranking of areas with the Zonation software for spatial conservation planning (see Methods)2,50. It selects areas that are complementary and jointly achieve a well-balanced representation across all species. The goal is not to propose a global prioritisation of new PAs but rather to identify where conservation benefits may be ‘easier’ to reach given the social-environmental context (hereafter called ‘potential gains’), when compared to areas where such benefits are likely to be ‘harder’ to be achieved as enabling conditions are not met (hereafter called ‘unrealistic gains’). More precisely, we define as ‘potential gains’ the unprotected areas with both a high relative likelihood of being protected (>90%) and a high conservation prioritisation ranking (top 10%) corresponding to ‘low-hanging fruits’ in conservation51. On the opposite, ‘unrealistic gains’ are the unprotected areas with high conservation priority (top 10%) but with a low relative likelihood of being protected (<10%) given their social-environmental context (Fig. 6a, b), so ‘high-hanging fruits’. Here, the social-environmental factors are considered more like proxies of a wider set of conditions enabling or preventing PA establishment than causal determinants, independent of the biodiversity attributes.

Fig. 6: Potential and unrealistic conservation gains for terrestrial and marine vertebrates.
figure 6

We define as low vs. high conservation gains (x-axis) the bottom vs. top-ranked (10%) unprotected areas for the conservation of vertebrates according to the maximisation of species range size coverage on land (a birds and mammals) and sea (b fish). We define as potential vs. unrealistic conservation gains (y-axis), the unprotected areas being the most likely (‘low-hanging fruits’) vs. unlikely (‘high-hanging fruits’) to be protected according to their social-environmental context. On global maps, established protected areas are in green, while we only represent the potential vs. unrealistic high conservation gains on land (c) and sea (d). The gradient of colours corresponds to the relative likelihood that these unprotected areas are turned into protected areas according to the 14 social-environmental factors (Tables 1 and 2) and Random Forest models (see ‘Methods’). So, potential high conservation gains are in dark blue, while unrealistic high gains are in dark red. See Fig. 5 for the average relative likelihood of PA establishment per country on land and sea. These patterns could be biased by missing PAs in the Word Database on Protected Areas. Source data and codes are provided at https://doi.org/10.5281/zenodo.11183846.

We observe that potential high conservation gains on land are highly clustered in space (Fig. 6c) and are, for different reasons, almost absent in continental Africa, Europe and South Asia. In European countries, most unprotected areas are not top-ranked in terms of bird and mammal species protection, even if the social-environmental context is favourable to the establishment of PAs. These areas are thus potential low gains for the conservation of terrestrial vertebrates. Conversely, many areas of Africa and South Asia can provide high conservation gains if protected, but the socio-environmental context tends to restrict new PAs establishment, so these areas correspond to unrealistic high conservation gains for terrestrial vertebrate biodiversity. Potential high conservation gains are mainly located along the North Pacific coast of the American continent, in Amazonia, in Southern Australia and in Northern Asia due to relatively fewer local social-environmental constraints but important conservation gains for terrestrial vertebrates. For some countries (US and Australia) the enabling conditions mainly stem from socioeconomic factors like HDI while, for others (some parts of Columbia and Philippines), enabling conditions mostly belong to factors like high elevation and low human pressure (Fig. 4). However, despite an overall favourable national context in the US and Australia, we also observe high within-country heterogeneity (Fig. 6c, d) suggesting the importance of local factors, potentially acting as barriers to PA establishment such as accessibility or distance to the coast.

Unrealistic but high conservation gains on land are mainly located in South Asia, Africa and South America. In these areas, despite the presence of many vertebrate species, the socioeconomic context tends to be unfavourable for the establishment of PAs (Fig. 5). In such areas, other effective area-based conservation measures (OECM) than PAs should be favoured since they can still deliver positive ecological outcomes while being more inclusive with locals52. Yet, care should be taken to avoid the risks associated with OECMs53. Alternatively, privately protected areas (PPAs) could also complement or overtake state-government PAs to achieve unrealistic but high conservation gains39. Indeed, our approach identifies areas where long-term benefits for conservation will certainly require more inclusive alternatives to classical highly restrictive IUCN PAs or regional cooperation to help countries with unfavourable socioeconomic conditions to protect their land54 and contribute to reaching the target of 30% protection coverage.

At sea, we identified very few potential high conservation gains, which are mostly located in the Eastern Pacific and North-Eastern Australia, but many unrealistic high gains (Fig. 6d). Unprotected marine areas with a favourable social-environmental context for the establishment of new PAs would not provide important conservation gains for fish biodiversity in return. Conversely, unprotected marine areas where biodiversity gains are expected to be the highest have a low relative likelihood to become protected given the local socioeconomic context (Fig. 6b). These unrealistic high conservation gains are widespread along the coast of many countries (mainly in Africa, Asia and America), except Australia and, to a lesser extent, Northern Europe where local conditions are more favourable to PAs establishment (Fig. 5). Therefore, the short-term protection of marine fishes seems more challenging (few ‘low-hanging fruits’) than the protection of terrestrial vertebrates given the inherent difficulty to establish strictly restrictive PAs (IUCN category I) in marine areas of high conservation gains. Alternative, easier-to-implement solutions are urgently needed in these EEZs to halt biodiversity decline, such as other effective area-based conservation tools and non-spatial sustainable management of marine resources52,55; yet their protection effectiveness remains to be demonstrated53.

More generally, the global conservation target of 30% coverage by protected areas before 2030, including 10% of fully protected areas, seems hardly achievable without strong efforts to overcome socioeconomic barriers or propose alternative and more inclusive solutions (OECMs and PPAs), especially for marine waters.

Methods

Global grid cells of protected and unprotected areas

The global dataset was integrated into a global 10 × 10 km resolution grid covering both land and sea. We intersected this initial grid with the GADM (global administrative areas-GADM version 3.6) for the terrestrial part and the exclusive economic zone boundaries (Marin Regions ‘maritime boundaries’ version 11) for the marine cells. As a result, a cell that intersects with either of these two previous databases is either marine or terrestrial. For the coastal environment we assigned the ecosystem (terrestrial or marine) that overlapped more than 50% of the cell surface.

These land and sea cells were intersected with the Word Database on Protected Areas (April 2020). A cell was considered as having some protection if a PA partly overlapped the cell. This resulted in 258,725 cells on land and 210,873 cells at sea that have some protection coverage. For each protected cell, we reported the IUCN category between I (highly restrictive or no-take) and VI (least restrictive IUCN category). If, within a protected cell, several PAs with different IUCN categories were present, we associated the more restrictive category with the cell. We randomly selected the same number of cells among the remaining unprotected cells on both land and sea to obtain a balanced design.

Socioeconomic and environmental factors

We associated to each grid cell a set of 14 socioeconomic and environmental factors known for their potential to explain or to correlate with the presence of PAs on both land and sea (Tables 1 and 2). We chose these factors to make our analyses spatially explicit with probabilistic models of conservation action that integrate theoretically supported social-environmental correlates of PA establishment15. These factors are detailed in Supplementary Data 1.

Modelling the relative likelihood of protected area occurrence

Using our presence-background data, we modelled relative likelihoods of PA occurrences on both land and sea40. Prior to model fitting, we evaluated the collinearity between factors by computing Pearson correlation coefficients. All pairs of factors were weakly or moderately correlated on both land and sea (−0.7 > correlation > +0.7). Prior to fitting, we also filled all the gaps (0.7% of cells on land and 1.2% at sea) in the factor data using a Random Forest model designed specifically to impute missing values. For this, we used the missForest function implemented in the missForest R package56. Among the many statistical alternatives to impute values that are missing in large datasets using the other factors as predictors, we choose this machine learning technique since it performs well and makes very few assumptions about structural aspects of the data57.

Then, we built Random Forest (RF) models to relate the presence–absence of PA in a given cell (all IUCN categories and IUCN I only) on land and sea to the 7 socioeconomic and 7 environmental factors using the ranger function from the ranger R package, which provides a fast implementation of RF, particularly suited for high dimensional data58. So, RF was used as a classifier with two classes (presence and absence of PA) for each grid cell. We used 500 trees in each RF that make an ensemble of independent decision trees to improve model accuracy59.

We then calibrated RF predictions to obtain a continuous distribution of relative PA occurrences ranging homogeneously between 0 and 1, thus allowing its interpretation in a probabilistic way60. We applied a procedure based on flexible generalised additive binomial models (GAM) to observed data (here, presence-absence of PA) as a function of response-scale model fits60 using the gam function from the gam R package. This procedure avoids overfitting, which potentially leads to the perfect separation of classes in predictions from algorithms like RF. We ultimately obtained a calibrated relative likelihood of being protected for all unprotected grid cells according to 14 social-environmental factors.

Models’ performance and factors’ importance

Model performances were assessed using a 10-fold spatial cross-validation procedure (using 90% of grid cells for the training and 10% for the testing) and two classical metrics: accuracy, which is the proportion of correct predictions, and the true skill statistic (TSS), which is a simple and intuitive measure considering both sensitivity (i.e., the proportion of PA presences accurately predicted) and specificity (i.e., the proportion of PA absences accurately predicted). When accuracy ranges between 0 and 1, TSS ranges between −1 and 1, with 1 indicating perfect agreement and values lower than zero indicating a performance worse than random61. These metrics were calculated from the confusion matrix using the confusionMatrix function from the caret R package.

Since our grid cells, split between our train and test datasets, are not independent with presences tending to aggregate over space due to large PAs, there is the risk of obtaining invalid overoptimistic results, particularly with machine learning algorithms41,42. We thus created spatially independent training and testing blocks on land and sea based on a pre-specified and conservative distance of 1000 km to perform a spatial cross-validation procedure. For this, we used the spatialBlocks function from the blockCV package v2.1.5. Finally, we tested for the presence of spatial autocorrelation in model residuals as a function of distance between grid cells with Moran’s I statistic.

We also calculated factor importances for each cross-validation procedure, using the varImpRanger function from the varImp R package, based on the area under the curve (AUC) loss when a factor is randomly permuted, using the multiclass.AU1P measure dedicated to binary classifiers while considering the a priori distribution of the classes62. We also estimated impurity values as alternative measures of factor importance. We displayed partial response plots to show the effect of each factor while controlling for the others, so we kept constant at their means, using the partial and plotPartial functions from the dpd R package.

Sensitivity analyses

We performed a first sensitivity analysis to test the robustness of our results to the overweight, in terms of grid cell number, of very large PAs. For this, we removed 27 PAs on land and 2 at sea since they cover more than 10% of the total protected area.

Since we certainly missed PAs from low-HDI countries due to a lack of IUCN declaration, we performed a second sensitivity analysis where we removed PAs from countries with HDI in the bottom 10% values, so 393 PAs on land and 108 at sea. On the other hand, high-HDI countries may be overrepresented in IUCN I PAs, so we performed a third sensitivity analysis where we removed PAs from countries with HDI in the top 10% values, totalling 10,254 PAs on land and 1094 at sea.

We conducted the same sensitivity analyses for the model focusing on restrictive PAs (IUCN I). These sensitivity analyses resulted in the removal of 5155 PAs on land and 791 at sea for high HDI values, 10 PAs on land and 2 at sea for lower HDI values and 9 PAs on land and 2 at sea for large PAs.

All analyses can be reproduced through our GitHub repository.

Global biodiversity maps of vertebrates

Terrestrial mammals and birds

We used the IUCN range maps for 5529 terrestrial mammal species and the BirdLife range maps for 10,709 bird species (breeding ranges only)63. Extinct (EX) and Extinct in the Wild (EW) species were removed. For birds, polygons of the Spheniscidae family, which are mostly marine birds, were also removed. Geographic ranges were converted to a 10 km resolution Mollweide equal-area grid projection (i.e., 100 km2 cells). This resolution was previously validated as the finest justifiable for these data globally without incurring false presences which would impose a geographic and ecological bias on spatial prioritisation solutions64.

Marine fishes

We collated fish species data from the Ocean Biogeographic Information System (OBIS). We inventoried 16,238,200 occurrence records from 34,883 entries. We cleaned the data by identifying the synonyms, misspellings and rare species (only one occurrence) and by restraining the dataset to species present in the marine environment according to FishBase. This resulted in a set of 11,503,257 occurrences for 11,345 marine fish species around the world. We reconstructed distribution maps for each species, defined as the convex polygon surrounding the area where each species was observed. We divided the resulting polygons into four parts across the world to integrate possible range discontinuity between the two hemispheres and the Atlantic and Pacific Oceans (e.g., species with antitropical distribution). We refined each species distribution map by removing areas where maximal depths fell outside the minimum or maximum known depth range of the species. As the OBIS database did not well represent the tropical fishes, we merged the database with 6316 fish species censused on reefs65. We finally obtained a world database containing 14,050 fish species that we aggregated on a 1° resolution grid covering all oceans. We transformed individual species shapefiles into equal-area raster grids at a resolution of 1°.

Spatial conservation prioritisation

We used the Zonation software (version 4) for spatial conservation prioritisation based on vertebrate species distributions on land (mammals and birds) and at sea (fishes). This allowed us to rank unprotected cells in their ability to complete the existing PA network in order to provide an optimised global representative coverage for vertebrate conservation. Zonation uses a reverse stepwise heuristic algorithm that removes cells that contribute to the smallest marginal losses in the representation of biodiversity features50. More precisely, we identified the optimal expansion of the existing PA network using the core-area zonation algorithm, which is based on the ranking of the most important occurrences of a feature in a given cell. Thus, Zonation gives the highest priority to locations with high occurrences of rare species, maximising both the representation of all species and their proportional coverage. Zonation resulted in a prioritisation hierarchy; these priority values were then used to identify locations that contribute most to biodiversity representation, so the unprotected cells of high conservation gain.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.