Science is producing data in amounts so large as to be unfathomable. Advances in artificial intelligence (AI) are increasingly needed to make sense of all this information (see ref. 1 and Nature Rev. Phys. 4, 353; 2022). For example, through training on copious quantities of data, machine-learning (ML) methods get better at finding patterns without being explicitly programmed to do so.
In our field of Earth, space and environmental sciences, technologies ranging from sensors to satellites are providing detailed views of the planet, its life and its history, at all scales. And AI tools are being applied ever more widely — for weather forecasting2 and climate modelling3, for managing energy and water4, and for assessing damage during disasters to speed up aid responses and reconstruction efforts.
The rise of AI in the field is clear from tracking abstracts5 at the annual conference of the American Geophysical Union (AGU) — which typically gathers some 25,000 Earth and space scientists from more than 100 countries. The number of abstracts that mention AI or ML has increased more than tenfold between 2015 and 2022: from less than 100 to around 1,200 (that is, from 0.4% to more than 6%; see ‘Growing AI use in Earth and space science’)6.
Yet, despite its power, AI also comes with risks. These include misapplication by researchers who are unfamiliar with the details, and the use of poorly trained models or badly designed input data sets, which deliver unreliable results and can even cause unintended harm. For example, if reports of weather events — such as tornadoes — are used to build a predictive tool, the training data are likely to be biased towards heavily populated regions, where more events are observed and reported. In turn, the model is likely to over-predict tornadoes in urban areas and under-predict them in rural areas, leading to unsuitable responses7.
Data sets differ widely, yet the same questions arise in all fields: when, and to what extent, can researchers trust the outcomes of AI and mitigate harm? To explore such questions, the AGU, with the support of NASA, last year convened a community of researchers and ethicists (including us) at a series of workshops. The aim was to develop a set of principles and guidelines around the use of AI and ML tools in the Earth, space and environmental sciences, and to disseminate them (see ‘Six principles to help build trust’)6.
Answers will evolve as AI develops, but the principles and guidelines will remain grounded in the basics of good science — how data are collected, treated and used. To guide the scientific community, here we make practical recommendations for embedding openness, transparency and curation in the research process, and thus helping to build trust in AI-derived findings.
Watch out for gaps and biases
It is crucial for researchers to fully understand the training and input data sets used in an AI-driven model. This includes any inherent biases — especially when the model’s outputs serve as the basis of actions such as disaster responses or preparation, investments or health-care decisions. Data sets that are poorly thought out or insufficiently described increase the risk of ‘garbage in, garbage out’ studies and the propagation of biases, rendering outcomes meaningless or, even worse, dangerous.
Science and the new age of AI: a Nature special
For example, many environmental data have better coverage or fidelity in some regions or communities than in others. Areas that are often under cloud cover, such as tropical rainforests, or that have fewer in situ sensors or satellite coverage, such as the polar regions, will be less well represented. Similar disparities across regions and communities exist for health and social-science data.
The abundance and quality of data sets are known to be biased, often unintentionally, towards wealthier areas and populations and against vulnerable or marginalized communities, including those that have historically been discriminated against7,8. In health data, for instance, AI-based dermatology algorithms have been shown to diagnose skin lesions and rashes less accurately in Black people than in white people, because the models are trained on data predominantly collected from white populations8.
Such problems can be exacerbated when data sources are combined — as is often required to provide actionable advice to the public, businesses and policymakers. Assessing the impact of air pollution9 or urban heat10 on the health of communities, for example, relies on environmental data as well as on economic, health or social-science data.
Living guidelines for generative AI — why scientists must oversee its use
Unintended harmful outcomes can occur when confidential information is revealed, such as the location of protected resources or endangered species. Worryingly, the diversity of data sets now being used increases the risks of adversarial attacks that corrupt or degrade the data without researchers being aware11. AI and ML tools can be used maliciously, fraudulently or in error — all of which can be difficult to detect. Noise or interference can be added, inadvertently or on purpose, to public data sets made up of images or other content. This can alter a model’s outputs and the conclusions that can be drawn. Furthermore, outcomes from one AI or ML model can serve as input for another, which multiplies their value but also multiplies the risks through error propagation.
Our recommendations for data deposition (see ref. 6 and ‘Six principles to help build trust’) can help to reduce or mitigate these risks in individual studies. Institutions should also ensure that researchers are trained to assess data and models for spurious and inaccurate results, and to view their work through a lens of environmental justice, social inequity and implications for sovereign nations12,13. Institutional review boards should include expertise that enables them to oversee both AI models and their use in policy decisions.
Develop ways to explain how AI models work
When studies using classical models are published, researchers are usually expected to provide access to the underlying code, and any relevant specifications. Protocols for reporting limitations and assumptions for AI models are not yet well established, however. AI tools often lack explainability — that is, transparency and interpretability of their programs. It is often impossible to fully understand how a result was obtained, what its uncertainty is or why different models provide varying results14. Moreover, the inherent learning step in ML means that, even when the same algorithms are used with identical training data, different implementations might not replicate results exactly. They should, however, generate results that are analogous.
In publications, researchers should clearly document how they have implemented an AI model to allow others to evaluate results. Running comparisons across models and separating data sources into comparison groups are useful soundness checks. Further standards and guidance are urgently needed for explaining and evaluating how AI models work, so that an assessment comparable to statistical confidence levels can accompany outputs. This could be key to their further use.
Researchers and developers are working on such approaches, through techniques known as explainable AI (XAI) that aim to make the behaviour of AI systems more intelligible to users. In short-term weather forecasting, for example, AI tools can analyse huge volumes of remote-sensing observations that become available every few minutes, thus improving the forecasting of severe weather hazards. Clear explanations of how outputs were reached are crucial to enable humans to assess the validity and usefulness of the forecasts, and to decide whether to alert the public or use the output in other AI models to predict the likelihood and extent of fires or floods2.
In Earth sciences, XAI attempts to quantify or visualize (for example, through heat maps) which input data featured more or less prominently in reaching the model’s outputs in any given task. Researchers should examine these explanations and ensure that they are reasonable.
Forge partnerships and foster transparency
For researchers, transparency is crucial at each step: sharing data and code; considering further testing to enable some forms of replicability and reproducibility; addressing risks and biases in all approaches; and reporting uncertainties. These all necessitate an expanded description of methods, compared with the current way in which AI-enabled studies are reported.
Research teams should include specialists in each type of data used, as well as members of communities who can be involved in providing data or who might be affected by research outcomes. One example is an AI-based project that combined Traditional Knowledge from Indigenous people in Canada with data collected using non-Indigenous approaches to identify areas that were best suited to aquaculture (see go.nature.com/46yqmdr).
Sustain support for data curation and stewardship
There is already a movement across scientific fields for study data, code and software to be reported following FAIR guidelines, meaning that they should be findable, accessible, interoperable and reusable. Increasingly, publishers are requiring that data and code be deposited appropriately and cited in the reference sections of primary research papers, following data-citation principles15,16. This is welcome, as are similar directives from funding bodies, such as the 2022 ‘Nelson memo’ to US government agencies (see go.nature.com/3qkqzes).
AI tools as science policy advisers? The potential and the pitfalls
Recognized, quality-assured data sets are particularly needed for generating trust in AI and ML, including through the development of standard training and benchmarking data sets17. Errors made by AI or ML tools, along with remedies, should be made public and linked to the data sets and papers. Proper curation helps to make these actions possible.
Leading discipline-specific repositories for research data provide quality checks and the ability to correct or add information about data limitations and bias — including after deposition. Yet we have found that the current data requirements set by funders and journals have inadvertently incentivized researchers to adopt free, quick and easy solutions for preserving their data sets. Generalist repositories that instantly register the data set with a digital object identifier (DOI) and generate a supporting web page (landing page) are increasingly being used. Completely different types of data are too often gathered under the same DOI, which can cause issues in the metadata, make provenance hard to trace and hinder automated access.
This trend is evident from data for papers published in all journals of the AGU5, which implemented deposition policies in 2019 and started enforcing them in 2020. Since then, most publication-related data have been deposited in two generalist repositories: Zenodo and figshare (See ‘Rise in data archiving’). (Figshare is owned by Digital Science, which is part of Holtzbrinck, the majority shareholder in Nature’s publisher, Springer Nature.) Many institutions maintain their own generalist repositories, again often without discipline-specific, community-vetted curation practices.
This means that many of the deposited research data and metadata meet only two of the FAIR criteria: they are findable and accessible. Interoperability and reusability require sufficient information about data provenance, calibration, standardization, uncertainties and biases to allow data sets to be combined reliably — which is especially important for AI-based studies.
Disciplinary repositories, as well as a few generalist ones, provide this service — but it takes trained staff and time, usually several weeks at least. Data deposition must therefore be planned well before the potential acceptance of a paper by a journal.
More than 3,000 research repositories exist18, although many are not actively accepting new data. The most valuable repositories are those that have long-term funding for storage and curation, and accept data globally, such as GenBan, the Protein Data Bank and the EarthScope Consortium (for seismological and geodetic data). Each is part of an international collaboration network. Some repositories are funded, but are restricted to data derived from the funder’s (or country’s) grants; others have short-term funding or require a deposition fee. This complex landscape, the various restrictions on deposition and the fact that not all disciplines have an appropriate, curated, field-specific repository all contribute to driving users towards generalist repositories, which compounds the risks with AI models.
AI and science: what 1,600 researchers think
Scholarly organizations such as professional societies, funding agencies, publishers and universities have the necessary leverage to promote progress. Publishers, for example, should implement checks and processes to ensure that AI and ML ethics principles are supported through the peer-review process and in publications. Ideally, common standards and expectations for authors, editors and reviewers should be adopted across publishers and be codified in existing ethical guidance (such as through the Council of Science Editors).
We also urge funders to require that researchers use suitable repositories as part of their data sharing and management plan. Institutions should support and partner with those, instead of expanding their own generalist repositories.
Sustained financial investments from funders, governments and institutions — that do not detract from research funds — are needed to keep suitable repositories running, and even just to comply with new mandates16.
Look at long-term impact
The broader impacts of the use of AI and ML in science need to be tracked. Research that assesses workforce development, entrepreneurial innovation, real community engagement and the alignment of all the scholarly organizations involved is needed. Ethical aspects must remain at the forefront of these endeavours: AI and ML methods must reduce social disparities rather than exacerbate them; enhance trust in science rather than undercut it; and intentionally include key stakeholder voices, not leave them out.
AI tools, methods and data generation are advancing faster than institutional processes for ensuring quality science and accurate results. The scientific community must take urgent action, or risk wasting research funds and eroding trust in science as AI continues to develop.