Making sense of chemical space network shows signs of criticality

Abstract

Chemical space modelling has great importance in unveiling and visualising latent information, which is critical in predictive toxicology related to drug discovery process. While the use of traditional molecular descriptors and fingerprints may suffer from the so-called curse of dimensionality, complex networks are devoid of the typical drawbacks of coordinate-based representations. Herein, we use chemical space networks (CSNs) to analyse the case of the developmental toxicity (Dev Tox), which remains a challenging endpoint for the difficulty of gathering enough reliable data despite very important for the protection of the maternal and child health. Our study proved that the Dev Tox CSN has a complex non-random organisation and can thus provide a wealth of meaningful information also for predictive purposes. At a phase transition, chemical similarities highlight well-established toxicophores, such as aryl derivatives, mostly neurotoxic hydantoins, barbiturates and amino alcohols, steroids, and volatile organic compounds ether-like chemicals, which are strongly suspected of the Dev Tox onset and can thus be employed as effective alerts for prioritising chemicals before testing.

Introduction

The canonical representation of chemical spaces based on a coordinate system with multiple dimensions suffers from several issues. It is not invariant to the chosen representation: changing the adopted features can dramatically affect the boundaries of the chemical space and its properties. It cannot deal naturally with features that are both discrete and continuous^1,2. In this regard, metric spaces can make things even harder while complex networks, which are intrinsically non-metric, can promptly offer a solution.

In recent years, the opportunities given by the adoption of complex networks to model the chemical spaces, the so-called chemical space networks (CSNs), have been widely investigated. Several fields have been studied, such as medicinal chemistry, physicochemical properties, and de novo drug design, just to mention a few^3,4,5,6,7,8. An additional advantage provided by CSNs is the smart mathematical framework behind them that is the graph theory. Centrality metrics such as degree, betweenness and eigenvector centrality can suitably characterise the behaviour of the chemicals within a network, while their distribution can deepen our understanding of the network organisation and, therefore, of the resulting chemical space⁹. Topological properties allow for the characterization of a network organisation, for example the presence of hubs or communities. Features, such as scale-freeness or small-worldness, can signal the presence of patterns and dynamics within a network as extensively reported elsewhere^10,11; by contrast, random graph models, such as the Erdos–Renyi (ER) model, can be used for benchmarking or to assess the meaningfulness of specific structures and architectures¹².

Previous studies have investigated CSNs as threshold networks, i.e., networks whose structures depend and vary according to specific cut-off values set on the network connections. The constituent elements of these networks, usually called nodes, are chemicals while connections are pairwise molecular similarities: these studies were aimed at comparing different datasets^13,14,15. This approach has shown how different similarity metrics generate different CSNs, how different choices of the similarity cut-off affect nodal properties like degree or assortativity, among the others, and the presence of molecular communities^16,17,18. Defining an optimal cut-off is far from being a simple fine-tuning matter and varying the similarity threshold adopted to construct a network deeply shapes the network topology and its meaningfulness^14,19,20. Moreover, the task is complicated by the huge heterogeneity of the chemical space. To mitigate this issue, this work will be focused on a reduced yet extremely interesting class of chemicals, consisting of small molecules experimentally labelled as toxic with respect to developmental toxicity (Dev Tox).

Herein, we investigate the possibility of choosing an optimal threshold based on statistical mechanics properties. We identify a first-order phase transition, a signal of emergent behaviours within a complex system, as a flag that an optimal cut-off has been reached. Although this perspective has been thoroughly reported in several case studies and has demonstrated its effectiveness by providing fundamental advances in our understanding of collective phenomena^{21,22,23,24,25,26}, an application to the CSNs is still missing. More broadly, our goal is to provide interpretable insights on CSNs.

The data investigated deal with Dev Tox. This concerns offspring abnormal development due to the exposure to harmful agents or to hazard conditions²⁷. It is a complex human health endpoint, of utmost importance especially for the care of the maternal and child health. Predicting the Dev Tox onset remains extremely challenging and far from reaching satisfactory levels of accuracy^28,29. In the last decade, several predictive approaches, especially based on machine learning, have been proposed^30,31,32,33.

The knowledge gap in understanding Dev Tox depends on two intimately related aspects: on one side the limited amount of Dev Tox measured data as well as their uncertainty and on the other the structural complexity of the chemicals reflecting the space heterogeneity. Hence, we propose to employ the CSN perspective to model such heterogeneity and gain fundamental insights about which toxic chemicals share common molecular patterns and, eventually, deepen our rational understanding of the latent toxicological mechanisms behind Dev Tox. In this respect, our study also falls in the broad cutting-edge domain of eXplainable Artificial Intelligence (XAI)^34,35,36,37.

Results

Criticality signals optimal thresholding

To study the CSN of Dev Tox, a database of small molecules, whose toxic effects are well established, was collected. To the best of our knowledge, the data used here represent the largest publicly available base of knowledge for Dev Tox. Two main sources were taken into account for modelling Dev Tox. The CAESAR³⁰ and the Procter & Gamble (P&G) datasets³⁸, including 201 and 621 experimentally toxic chemicals, respectively. The Food and Drug Administration (FDA) classifies as toxic the chemicals belonging to one of the three following categories^39,40: class C that reports chemicals tested positive for Dev Tox in animal studies; class D that reports chemicals tested positive for Dev Tox only in human studies; class X that reports chemical tested positive in both animal and human studies and/or had evidence of foetal risk based on human experience.

Preliminary examinations were carried out to select the most reliable chemicals for Dev Tox modelling and, after removing duplicates, a number N of 684 toxic chemicals were included in this study. List of all chemicals with their structures and Dev Tox annotations is available in the Table S1 of the Supporting Information. The Table S1 also includes a list of 135 non-toxic chemicals available from the previously mentioned data repository which were used for classification purposes.

N(N-1)/2 pairwise Small Molecule Subgraph Detector (SMSD)⁴¹ Tanimoto⁴² similarity measures were calculated and connections between nodes with similarity greater than the cut-off value, set to 0.3, were established based on the assumption that a very low similarity does not reflect any meaningful information. Such a choice allowed a remarkable computational burden reduction. The resulting similarity distribution along with the CSN is presented in Fig. 1.

This study aims at analysing the CSN topology, thus network weights are neglected as topological features are generally weight-independent^43,44; nevertheless, weights become crucial when considering different cut-off values and this can dramatically affect topology. Hence, by thresholding Tanimoto similarities at different cut-off values, we investigated the variations occurring within the CSN in terms of three main centrality metrics: degree (d), betweenness (b) and eigenvector centrality (e). Moreover, a paired analysis on an ensemble of ER graphs comparable with the CSN was carried out.

For each threshold value, the number of CSN edges (E) was computed with the maximum possible value being:

$${E}_{max}=Nleft(N-1right)/2$$

with N, the number of chemicals, being the CSN order.

Thus, the connection probability

$$p=E/{E}_{max}$$

was calculated and an ER model (Gleft(N,pright)) was simulated (for statistical robustness, 20 different simulations were performed). The results are shown in Fig. 2.

The number of CSN connections occurring in the model ranges from 0 (when the Tanimoto threshold is 1) to 41,807 (when the Tanimoto threshold is 0.3) while the resulting connection probability p ranges approximately within (0div {10}^{-1}) which includes the ER critical probability, which is ({p}_{crit} sim 1/N={10}^{-3}). As expected, centrality metrics intrinsically related to edge counting, such as degree and eigenvector centrality, showed monotonically decreasing trends. This can easily be explained as a direct consequence of the edge removal. Apart from statistical fluctuations, the degree is equal for both models. In fact, by construction, the ER model is simulated with the connection probability retrieved by the CSN. Eigenvector centrality has higher values in the ER ensemble, while for higher probability values the two trends are substantially similar. An analogous consideration holds for eigenvector centrality and betweenness (see Fig. S1 of the Supporting Information for a detailed comparison). For both CSN and ER models, betweenness increases with connection probability until a maximum is reached, then betweenness abruptly decreases. Interestingly, while this phase transition occurs as expected with (p sim {p}_{crit}) in the ER model, for the CSN model this phase transition occurs at a slightly higher value ({p}_{crit}^{CSN} {sim 5cdot 10}^{-3}) which corresponds to a Tanimoto similarity of ~ 0.7. Overall, these results highlight the presence of an optimal cut-off signalled by the behaviour of betweenness and a problematic similarity between the constructed CSN and a random graph.

The CSN is not random

The CSN being a random graph would be not only far from intuition, as chemicals showing similar behaviours should be close in the chemical space, but it would also pose fundamental issues as a random network by definition does not include meaningful structures.

It can be easily shown that the phase transition signalled by the peak in betweenness corresponds to the first-order phase-transition of the giant component in a random graph, although it does not occur at (p 1/N), see Fig. 3. The fraction of nodes within the giant component becomes non-null at the same critical probability ({p}_{crit}) at which betweenness abruptly increases (as previously shown in Fig. 2). However, the studied CSN cannot be a random graph and the definitive proof is provided by assortativity. In fact, Fig. 3 shows that assortativity increases with the connection probability until it reaches a maximum at ({p}_{crit}).

This behaviour is a consequence of the transitivity of Tanimoto similarity. Low probabilities correspond to high similarity; therefore, if two chemicals are connected to a third one then they will probably be connected. This is a typical assortative behaviour, and it is not consistent with a random graph whose assortativity should be close to zero. In fact, the CSN assortativity dramatically drops, when more and more edges are added, tending to the behaviour of a random graph. Thus, despite thresholding the CSN at criticality yielded the emergence of a giant component, the meaningfulness of CSN inner structures, based on molecular similarity, is preserved.

The Dev Tox archetypes

At criticality, the CSN experiences the giant component formation while satellite groups of few chemicals are also present. A community detection analysis was performed along with a modularity analysis to highlight the inner CSN organisation; moreover, the community cardinality was examined, see Fig. 4.

At criticality, almost 25% of network nodes are grouped within the top three populated communities, consisting of 46, 45 and 42 elements. The overall modularity and the percentage of connected nodes exceeded 80%. These findings suggest the existence of specific communities, based on molecular patterns, that we will call Dev Tox “archetypes”. It should be noted that these archetypes should not be strictly intended as toxicological classes; in fact, by construction, they are based only on structural similarities of toxicophores evaluated by means of the Tanimoto metric. To gain further insights into the chemical meaning of these communities, we computed hundreds of molecular descriptors (from physicochemical to auto-correlation properties) for each toxicophore and investigated their statistical association within the archetypes. After Bonferroni correction, we found 145 descriptors whose distributions can be significantly distinguished, at 1% significance, within the three communities.

This analysis highlighted the presence of descriptors capturing basic and easy-to-interpret features such as molecular weight, number of valence electrons and molecular refractivity. Interestingly, such descriptors are of particular interest when evaluating properties of fundamental importance such as drug-likeness^45,46. Other descriptors significantly related to communities were well-known topochemical indices such as BCUT descriptors, BertzCT and molecular connectivity chi indexes⁴⁷. Finally, an extremely relevant role was played by Moreau-Broto autocorrelation descriptors⁴⁸. Topological autocorrelation is frequently used in Quantitative Structure–Activity Relationship (QSAR) models⁴⁹ to assess how specific physicochemical properties are spatially distributed along molecules. Here, almost 50% of significant descriptors consisted of autocorrelation patterns, specifically involved with (i) atomic properties (number of valence or sigma electrons), mass, atomic numbers; (ii) electronegativity (Sanderson, Pauling and Allred-Rochow)⁵⁰.

Besides statistical significance, the median value for each descriptor and its interquartile range were evaluated to highlight the different behaviour within each community along with its variability. For example, the molecular refractivity is shown in Fig. 5.

A comprehensive list of significant descriptors and their characteristic ranges for Dev Tox is provided in the Table S2 of the Supporting Information. These results confirm that the communities reflect structural similarities and therefore they include chemicals with different properties.

In particular, the first community includes heterogeneous toxicophores in terms of structural moieties, being this probably due to its large size. In general, it includes aryl derivatives mainly comprising barbiturates, hydantoins and amino alcohols, commonly used as anticonvulsant drugs, GABA modulators, excitatory amino acid antagonists, hypnotic and sedative drugs, see Fig. 6a. The second community cover toxicophores with well-known cyclopentanoperhydrophenanthrene cores typical of steroids, responsible of essential biological functions such as fluidity and permeability regulation also known for fertility impairment, see Fig. 6b. The third community is made by small Volatile Organic Compounds (VOCs) ether-like chemicals, which act as pollutants and food toxins, see Fig. 6c.

Finally, we explored whether the toxicophores within a community shared specific drug targets generally consisting of receptor and enzyme proteins. To this aim, we used the PLATO platform⁵¹, which is specifically aimed at pairing biological targets to small molecules on the basis of the calculated similarity with respect to known ligands annotated with experimental biological measures retrieved from the CHEMBL database. Based on highly occurring targets, we found that the first community engage mostly targets relevant for the central nervous system, normally engaged by hypnotic, sedative and anticonvulsant drugs. While the community of steroids mainly interplayed with the hormonal system including mineralocorticoid and glucocorticoid receptors, progesterone receptors and androgen-binding protein receptors. The third community, mostly composed of VOCs ethers-like chemicals, did not pair with any verified target and this could be due to their low similarity to known drugs.

The CSN predictive power

To evaluate the potential of CSN as a support for predictive investigations, a further analysis was carried out. We included in the CSN the nodes representing the non-toxic chemicals listed in Table S1 and designed a simple classification framework to assess to what extent the CSN is able to distinguish toxic from non-toxic chemicals. For each node/chemical to be classified, the connected nodes/chemicals were inspected along with their pairwise Tanimoto similarities; then, the classification score was computed with a weighted average. The adopted weights were the computed similarities so that the most similar chemicals were the most influential in classification. Classification results in terms of accuracy, sensitivity, specificity and f1 metrics are presented in Fig. 7.

For the sake of completeness, the classification results explored the whole range of possible Tanimoto similarity thresholds. At the critical threshold (~ 0.7), the model was able to achieve a good overall accuracy (~ 80%) and f1 (~ 90%) an extremely high sensitivity (> 90%) while specificity was significantly lower (~ 25%). Performance uncertainties were obtained by means of an 80% hold-out cross-validation, iterated 100 times.

Discussion

The study of the chemical space is fundamentally based on one basic assumption: the structure of chemicals yields information about its physicochemical and biological properties, including the pharmacological or toxicological behaviour^{52,53,54,55,56}. Although this paradigm is generally accepted and it has been verified for several applications, some important issues still remain not completely solved^{56,57,58,59,60,61}. A fundamental limitation to this perspective concerns the extreme variability of physicochemical properties even when few or just one atomic variation occurs within a molecular structure. Hence, a potentially disrupting consideration arises: if even a subtle molecular change can turn a toxic chemical into a non-toxic chemical, then any conclusion drawn from molecular similarity is uninformative, the chemical space network and its inner communities are meaningless. A corollary question, which would prevent any possible further consideration, concerns the possibility of uniquely defining a scale at which to investigate the chemical space, i.e., to identify a suitable threshold for molecular similarity measurements.

Here, we provide an answer to both questions and demonstrate how they are indissolubly related. In fact, our findings showed an outstanding overlap between the Dev Tox CSN and an ER model; this was observed at all the threshold values, thus dangerously suggesting the possibility that the CSN was actually random. Also, the Dev Tox CSN showed a giant component phase transition as expected from an ER model. However, by definition, a random network should not show any kind of assortative behaviour: the assortativity of the Dev Tox CSN incontrovertibly demonstrates that it is not a random network, at least in the connection probability range explored. Moreover, its behaviour is far from that of a random network as the cut-off values approaches criticality. Thus, thresholding becomes intrinsically related to meaningfulness.

Once established the meaningfulness of the Dev Tox CSN, we characterised its inner communities using both molecular descriptors, as similar chemicals in a physical, biological or toxicological sense tend to exhibit similar properties^62,63,64,65, and biological functions. The proposed CSN is thus easily interpretable by domain experts^34,66 and could be profitably employed for drug repurposing and rational de novo design⁶⁷, strategic assets to mitigate the well-known issues of drug discovery, such as huge costs and extremely time-consuming procedures^68,69,70. In fact, thanks to PLATO target profiling, the Dev Tox archetypes outlined within the CSN could be related (with different reliability) to multiple biological activities.

This work also investigates the predictive power of the proposed CSN. The model was able to reach a reliable accuracy in Dev Tox prediction, with extremely high sensitivity. On the contrary, specificity remained substantially low. Two aspects deserve to be considered: (i) the informative content provided by structural similarity cannot reasonably provide a comprehensive description of toxicological patterns. It is easy to find examples of chemicals characterised by high structural similarity which show opposite toxicological behaviours, e.g., Dydrogesterone and Progesterone, non-toxic and toxic, respectively, differ by only a double bond⁵⁶. (ii) While toxicants are expected to have common characteristics, the wide spectrum of chemicals which are non-toxic with respect to this specific endpoint include very heterogeneous chemotypes, which in principle can share few or even no structural similarities. Thus, for a classification model, the correct detection of non-toxic chemicals is extremely challenging, not to mention the fact that this class is poorly represented in the available databases. It is worth mentioning that this is not an unexpected behaviour, it has been already observed in literature, although by studies based on different descriptions^{31,32,33,71,72,73}.

As a final remark, it is worth noting how complex network software suites and methodologies can manage systems with millions of nodes and therefore, thanks to its generality, the proposed approach can be straightforwardly adopted for broader chemical spaces, not necessarily limited to a single endpoint, and devoted to several applications. Our findings suggest that the characterization of the CSN could support in silico assessment of chemicals, specifically the so-called New Approach Methodologies (NAMs). The combined use of features derived from the CSN along with physicochemical descriptors and fingerprints could in principle enhance existing models.

Methods

Data curation

The chemicals were downloaded in SMILES format, with the associated binary Dev Tox label, from the freely accessible CAESAR and P&G databases. All SMILES data were cleaned of stereoisomeric assignments, were canonised and then the two databases were cross merged. 8 matches with opposite labels, derived mainly from P&G dataset, were discarded from the analysis; finally, 684 toxic molecules were collected. The selected chemicals were described by 2D molecular descriptors obtained from RDKit, and autocorrelators obtained from Mordred. Descriptors with a variability lower than 10% were removed from the analysis, thus resulting in a total of 774 descriptors, see the Table S3 of the Supporting Information for a comprehensive list.

Network analyses

The SMSD computes the largest common subgraph between two chemicals, where the molecular graph is a natural representation of a molecule based on its bonds and its atoms, except for hydrogen atoms, which are treated as implicit. The SMSD Tanimoto measure is therefore the ratio of the size of the common subgraph between two molecule and the size of the union of in common and not in common subgraphs. We used the implementation based on current CDK available at https://mvnrepository.com/artifact/gov.nih.ncats/smsd-core.

The CSN was compared with a uniform ER model, specifically an ER model G(N,E) whose E edges are uniformly sampled among the N(N-1)/2 possible connections. The advantage of such model is to provide a graph with the same number of connections as the thresholded CSN.

Comparisons were carried out by considering three centrality metrics, accounting for three different perspectives: a local, a global and a dynamic one.

Degree ({d}_{i}) of node i (local metric). ({d}_{i}={sum }_{j=1}^{N}{a}_{ij}) with ({a}_{ij}) representing the elements of the adjacency matrix of the considered network having N nodes. The degree takes into account only the connections of a node, in this sense it is a local centrality metric.
Eigenvector centrality ({e}_{i}) of node i (global metric). ({e}_{i}=left(frac{1}{lambda }right){sum }_{j=1}^{N}{a}_{ij}{e}_{j}) so that ({e}_{i}) satisfies an eigenvalue equation.
Betweenness ({b}_{i}) of node i (dynamic metric). ({b}_{i}={sum }_{sne ine t}^{N}frac{{pleft(iright)}_{st}}{{p}_{st}}), which measures the ratio between the number of paths p connecting a generic pair of nodes (s,t) passing through the node i and all the paths connecting them. Thus, this centrality metric evaluates the dynamical information flow within the network.

Moreover, to emphasise the difference between the defined CSN and a random network, assortativity was used.

Assortativity measures the preference of nodes to be connected to other nodes according to a similarity criterion, e.g., degree. For example, in assortative networks, high degree nodes tend to connect to nodes with high degree. On the other hand, if they prefer low-degree nodes, the network is anti-assortative. Random networks tend to have null assortativity as a consequence of connections’ randomness.

Studying these metrics allowed us to reveal the giant component phase transition. At criticality, community detection was performed by means of the Louvain algorithm⁷⁴. All network analyses were carried out with a Python 3.8.13 distribution and the NetworkX 3.1 package.

Statistical significance

We evaluated the differences between the distributions of several molecular descriptors within the communities that arose at criticality. To this aim, to discard any a priori hypothesis about the descriptor distributions, the non-parametric Mood’s test for medians was performed. Three tests were carried out for each descriptor to ensure that a significant difference had been found among all the three communities. The chosen significance level was 0.01 but a Bonferroni correction was adopted to avoid the multiple comparison bias.

The PLATO platform for target profiling

The predictive web platform PLATO was used to predict relevant therapeutic drug targets of small molecules. PLATO matches query molecules with the most similar molecules in its database of experimental activity values based on Tanimoto similarity calculated on 13 different fingerprints. The predicted targets with their referenced organism are experimentally linked to the similar molecules identified by the algorithm. For each prediction, a score is calculated by summing the Tanimoto coefficients of each fingerprint. This implies that an exact match corresponds to a score equal to 13. Query reports can be easily provided in json format upon programmatic POST requests. PLATO is freely available at https://prometheus.farmacia.uniba.it/plato/.

Data availability

Data used in this work are publicly available. The entire list of Dev Tox chemicals (in SMILES format) herein analysed is provided in the Table S1 of the Supporting Information.

References

Maggiora, G. M. & Bajorath, J. Chemical space networks: A powerful new paradigm for the description of chemical space. J. Comput. Aided Mol. Des. 28, 795–802 (2014).

Article
ADS
PubMed

Google Scholar
Vogt, M., Stumpfe, D., Maggiora, G. M. & Bajorath, J. Lessons learned from the design of chemical space networks and opportunities for new applications. J. Comput. Aided Mol. Des. 30, 191–208 (2016).

Article
ADS
PubMed

Google Scholar
Scalfani, V. F., Patel, V. D. & Fernandez, A. M. Visualizing chemical space networks with RDKit and NetworkX. J. Cheminform. 14, 87 (2022).

Article
PubMed
PubMed Central

Google Scholar
Recanatini, M. & Cabrelle, C. Drug research meets network science: Where are we?. J. Med. Chem. 63, 8653–8666 (2020).

Article
PubMed
PubMed Central

Google Scholar
Kunimoto, R. & Bajorath, J. Combining similarity searching and network analysis for the identification of active compounds. ACS Omega 3, 3768–3777 (2018).

Article
PubMed
PubMed Central

Google Scholar
Kunimoto, R. & Bajorath, J. Exploring sets of molecules from patents and relationships to other active compounds in chemical space networks. J. Comput. Aided Mol. Des. 31, 779–788 (2017).

Article
ADS
PubMed

Google Scholar
Medina-Franco, J. L., Sánchez-Cruz, N., López-López, E. & Díaz-Eufracio, B. I. Progress on open chemoinformatic tools for expanding and exploring the chemical space. J. Comput. Aided Mol. Des. 36, 341–354 (2022).

Article
ADS
PubMed

Google Scholar
Villoutreix, B. O., Labbé, C. M., Lagorce, D., Laconde, G. & Sperandio, O. A leap into the chemical space of protein-protein interaction inhibitors. Curr. Pharm. Des. 18, 4648–4667 (2012).

Article
PubMed
PubMed Central

Google Scholar
Kothiyal, M., Kumar, S. & Sukumar, N. Investigation of chemical space networks using graph measures and random matrix theory. J. Math. Chem. 60, 891–914 (2022).

Article
MathSciNet
MATH

Google Scholar
Barabási, A.-L. & Bonabeau, E. Scale-free networks. Sci. Am. 288, 60–69 (2003).

Article
PubMed

Google Scholar
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998).

Article
ADS
PubMed
MATH

Google Scholar
Janson, S., Luczak, T. & Rucinski, A. Random Graphs (Wiley, 2011).

MATH

Google Scholar
Zwierzyna, M., Vogt, M., Maggiora, G. M. & Bajorath, J. Design and characterization of chemical space networks for different compound data sets. J. Comput. Aided Mol. Des. 29, 113–125 (2015).

Article
ADS
PubMed

Google Scholar
Zhang, B., Vogt, M., Maggiora, G. M. & Bajorath, J. Design of chemical space networks using a Tanimoto similarity variant based upon maximum common substructures. J. Comput. Aided Mol. Des. 29, 937–950 (2015).

Article
ADS
PubMed

Google Scholar
Gurunathan, S., Yogalakshmi, T. & Balasubramanian, K. Topological characterization of statistically clustered networks for molecular similarity analysis. J. Math. Chem. 61, 859–876 (2023).

Article
MathSciNet
MATH

Google Scholar
Dunn, T. B. et al. Diversity and chemical library networks of large data sets. J. Chem. Inf. Model. 62, 2186–2201 (2022).

Article
PubMed

Google Scholar
Saldívar-González, F. I. & Medina-Franco, J. L. Approaches for enhancing the analysis of chemical space for drug discovery. Expert Opin. Drug Discov. 17, 789–798 (2022).

Article
PubMed

Google Scholar
Boezio, B., Audouze, K., Ducrot, P. & Taboureau, O. Network-based approaches in pharmacology. Mol. Inform. https://doi.org/10.1002/minf.201700048 (2017).

Article
PubMed

Google Scholar
Liu, H. Comparison between Merrifield-Simmons index and some vertex-degree-based topological indices. Comp. Appl. Math. 42, 89 (2023).

Article
MathSciNet
MATH

Google Scholar
Wu, M., Vogt, M., Maggiora, G. M. & Bajorath, J. Design of chemical space networks on the basis of Tversky similarity. J. Comput. Aided Mol. Des. 30, 1–12 (2016).

Article
ADS
PubMed

Google Scholar
Dorogovtsev, S. N., Goltsev, A. V. & Mendes, J. F. F. Critical phenomena in complex networks. Rev. Mod. Phys. 80, 1275–1335 (2008).

Article
ADS

Google Scholar
Radin, C. & Sadun, L. Phase transitions in a complex network. J. Phys. A: Math. Theor. 46, 305002 (2013).

Article
MathSciNet
MATH

Google Scholar
Bollobás, B., Janson, S. & Riordan, O. The phase transition in inhomogeneous random graphs. Random Struct. Algorithms 31, 3–122 (2007).

Article
MathSciNet
MATH

Google Scholar
Newman, M. E. J. & Watts, D. J. Renormalization group analysis of the small-world network model. Phys. Lett. A 263, 341–346 (1999).

Article
ADS
MathSciNet
MATH

Google Scholar
Fox Keller, E. Revisiting, “scale-free” networks. BioEssays 27, 1060–1068 (2005).

Article

Google Scholar
Amoroso, N. et al. Potential energy of complex networks: A quantum mechanical perspective. Sci. Rep. 10, 18387 (2020).

Article
PubMed
PubMed Central

Google Scholar
Gupta, P. K. Developmental toxicology. In Problem Solving Questions in Toxicology: A Study Guide for the Board and Other Examinations (ed. Gupta, P. K.) 137–143 (Springer International Publishing, 2020). https://doi.org/10.1007/978-3-030-50409-0_10.

Chapter

Google Scholar
Merlot, C. Computational toxicology—A tool for early safety evaluation. Drug Discov. Today 15, 16–22 (2010).

Article
PubMed

Google Scholar
Guengerich, F. P. & MacDonald, J. S. Applying mechanisms of chemical toxicity to predict drug safety. Chem. Res. Toxicol. 20, 344–369 (2007).

Article
PubMed

Google Scholar
Cassano, A. et al. CAESAR models for developmental toxicity. Chem. Central J. 4, S4 (2010).

Article

Google Scholar
Gunturi, S. B. & Ramamurthi, N. A novel approach to generate robust classification models to predict developmental toxicity from imbalanced datasets. SAR QSAR Environ. Res. 25, 711–727 (2014).

Article
PubMed

Google Scholar
Zhang, H., Mao, J., Qi, H.-Z. & Ding, L. In silico prediction of drug-induced developmental toxicity by using machine learning approaches. Mol. Divers. 24, 1281–1290 (2020).

Article
PubMed

Google Scholar
Togo, M. V. et al. TIRESIA: An explainable artificial intelligence platform for predicting developmental toxicity. J. Chem. Inform. Model. https://doi.org/10.1021/acs.jcim.2c01126 (2022).

Article

Google Scholar
Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).

Article

Google Scholar
Iswarya, B. & Manimekalai, K. Drug discovery with XAI using deep learning. In Principles and Methods of Explainable Artificial Intelligence in Healthcare (eds de Albuquerque, V. H. C. et al.) 131–149 (IGI Global, 2022). https://doi.org/10.4018/978-1-6684-3791-9.ch006.

Chapter

Google Scholar
Kikutsuji, T. et al. Explaining reaction coordinates of alanine dipeptide isomerization obtained from deep neural networks using Explainable Artificial Intelligence (XAI). J. Chem. Phys. 156, 154108 (2022).

Article
ADS
PubMed

Google Scholar
Cilli, R. et al. Explainable artificial intelligence (XAI) detects wildfire occurrence in the Mediterranean countries of Southern Europe. Sci. Rep. 12, 16349 (2022).

Article
ADS
PubMed
PubMed Central

Google Scholar
Wu, S. et al. Framework for identifying chemicals with structural features associated with the potential to act as developmental or reproductive toxicants. Chem. Res. Toxicol. 26, 1840–1861 (2013).

Article
PubMed

Google Scholar
Hewitt, M., Ellison, C. M., Enoch, S. J., Madden, J. C. & Cronin, M. T. D. Integrating (Q)SAR models, expert systems and read-across approaches for the prediction of developmental toxicity. Reprod. Toxicol. 30, 147–160 (2010).

Article
PubMed

Google Scholar
Freyer, A. M. Drugs in pregnancy and lactation 8th edition: A reference guide to fetal and neonatal risk. Obstet. Med. 2, 89 (2009).

Article
PubMed Central

Google Scholar
Rahman, S. A., Bashton, M., Holliday, G. L., Schrader, R. & Thornton, J. M. Small Molecule Subgraph Detector (SMSD) toolkit. J. Cheminform. 1, 12 (2009).

Article
PubMed
PubMed Central

Google Scholar
Bajusz, D., Rácz, A. & Héberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?. J. Cheminform. 7, 20 (2015).

Article
PubMed
PubMed Central

Google Scholar
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. & Hwang, D.-U. Complex networks: Structure and dynamics. Phys. Rep. 424, 175–308 (2006).

Article
ADS
MathSciNet
MATH

Google Scholar
Combining complex networks and data mining: Why and how – ScienceDirect | https://www.sciencedirect.com/.
Prediction of Physicochemical Parameters by Atomic Contributions | Journal of Chemical Information and Modeling. https://pubs.acs.org/.
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

Article
PubMed
PubMed Central

Google Scholar
Consonni, V. & Todeschini, R. Molecular descriptors. In Challenges and Advances in Computational Chemistry and Physics (eds Puzyn, T. et al.) 29–102 (Springer, 2009).

Google Scholar
Moreau, G. & Broto, P. Autocorrelation of molecular structures, application to SAR studies. Nouv. J. Chim. 4, 757–764 (1980).

Google Scholar
Nicolotti, O. & Carotti, A. QSAR and QSPR studies of a highly structured physicochemical domain. J. Chem. Inf. Model. 46, 264–276 (2006).

Article
PubMed

Google Scholar
Sliwoski, G., Mendenhall, J. & Meiler, J. Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign. J. Comput. Aided Mol. Des. 30, 209–217 (2016).

Article
ADS
PubMed

Google Scholar
Ciriaco, F., Gambacorta, N., Trisciuzzi, D. & Nicolotti, O. PLATO: A predictive drug discovery web platform for efficient target fishing and bioactivity profiling of small molecules. Int. J. Mol. Sci. 23, 5245 (2022).

Article
PubMed
PubMed Central

Google Scholar
The Chemical Space Project | Accounts of Chemical Research. https://doi.org/10.1021/ar500432k.
Reymond, J.-L., van Deursen, R., Blum, L. C. & Ruddigkeit, L. Chemical space as a source for new drugs. Med. Chem. Commun. 1, 30–38 (2010).

Article

Google Scholar
Kirkpatrick, P. & Ellis, C. Chemical space. Nature 432, 823–823 (2004).

Article
ADS

Google Scholar
Lipinski, C. & Hopkins, A. Navigating chemical space for biology and medicine. Nature 432, 855–861 (2004).

Article
ADS
PubMed

Google Scholar
Cruz-Monteagudo, M. et al. Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde?. Drug Discov. Today 19, 1069–1080 (2014).

Article
PubMed

Google Scholar
Gromski, P. S., Henson, A. B., Granda, J. M. & Cronin, L. How to explore chemical space using algorithms and automation. Nat. Rev. Chem. 3, 119–128 (2019).

Article

Google Scholar
Coley, C. W. Defining and exploring chemical spaces. TRECHEM 3, 133–145 (2021).

Google Scholar
Eckert, H. & Bajorath, J. Molecular similarity analysis in virtual screening: Foundations, limitations and novel approaches. Drug Discov. Today 12, 225–233 (2007).

Article
PubMed

Google Scholar
Scior, T. et al. How to recognize and workaround pitfalls in QSAR studies: A critical review. Curr. Med. Chem. 16, 4297–4313 (2009).

Article
PubMed

Google Scholar
Sahigara, F. et al. Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17, 4791–4810 (2012).

Article
PubMed
PubMed Central

Google Scholar
Hartenfeller, M. & Schneider, G. De novo drug design. In Chemoinformatics and Computational Chemical Biology (ed. Bajorath, J.) 299–323 (Humana Press, 2011). https://doi.org/10.1007/978-1-60761-839-3_12.

Chapter

Google Scholar
Öztürk, H., Özgür, A., Schwaller, P., Laino, T. & Ozkirimli, E. Exploring chemical space using natural language processing methodologies for drug discovery. Drug Discov. Today 25, 689–705 (2020).

Article
PubMed

Google Scholar
Medina-Franco, J. L., Martinez-Mayorga, K., Giulianotti, M. A., Houghten, R. A. & Pinilla, C. Visualization of the chemical space in drug discovery. Curr. Comput. Aided Drug Des. 4, 322–333 (2008).

Article

Google Scholar
Dobson, C. M. Chemical space and biology. Nature 432, 824–828 (2004).

Article
ADS
PubMed

Google Scholar
Askr, H. et al. Deep learning in drug discovery: An integrative review and future challenges. Artif. Intell. Rev. 56, 5975–6037 (2023).

Article
PubMed

Google Scholar
Alberga, D. et al. De Novo drug design of targeted chemical libraries based on artificial intelligence and pair-based multiobjective optimization. J. Chem. Inf. Model. 60, 4582–4593 (2020).

Article

Google Scholar
Choudhury, C., Arul Murugan, N. & Priyakumar, U. D. Structure-based drug repurposing: Traditional and advanced AI/ML-aided methods. Drug Discov. Today 27, 1847–1861 (2022).

Article
PubMed
PubMed Central

Google Scholar
Medina-Franco, J. L., Martinez-Mayorga, K. & Meurice, N. Balancing novelty with confined chemical space in modern drug discovery. Expert Opin. Drug Discov. 9, 151–165 (2014).

Article
PubMed

Google Scholar
Strittmatter, S. M. Overcoming drug development bottlenecks with repurposing: Old drugs learn new tricks. Nat. Med. 20, 590–591 (2014).

Article
PubMed
PubMed Central

Google Scholar
Marzo, M. et al. Integrating in silico models to enhance predictivity for developmental toxicity. Toxicology 370, 127–137 (2016).

Article
PubMed

Google Scholar
Marzo, M., Roncaglioni, A., Kulkarni, S., Barton-Maclaren, T. S. & Benfenati, E. In silico models for developmental toxicity. In In Silico Methods for Predicting Drug Toxicity (ed. Benfenati, E.) 217–240 (Springer, 2022). https://doi.org/10.1007/978-1-0716-1960-5_10.

Chapter

Google Scholar
Zhang, H. et al. Development of novel in silico model for developmental toxicity assessment by using naïve Bayes classifier method. Reprod. Toxicol. 71, 8–15 (2017).

Article
PubMed

Google Scholar
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).

Article
MATH

Google Scholar

Download references

Acknowledgements

The authors are thankful to the following: Programma Operativo Nazionale Ricerca e Innovazione 2014-2020 (CCI 2014IT16M2OP005), Fondo Sociale Europeo, Azione I.1 “Dottorati Innovativi con caratterizzazione Industriale” the Comune di Bovino and PIANO STRALCIO <> 2015-2017, Comune di Bovino (Foggia, Italy) and Dott. Renato Lombardi, Director of “Struttura Complessa di Farmacia, IRCCS, Casa Sollievo della Sofferenza”, (Foggia, Italy) (Code: DOT19C9KX4); Horizon Europe Seeds “L’intelligenza artificiale a tutela della salute in età pediatrica. Implementazione di una piattaforma digitale per il design di farmaci pediatrici sicuri”, Università degli Studi di Bari (Bari, Italy) (CUP: H99J21017390006).

Author information

Authors and Affiliations

Dipartimento di Farmacia – Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy

Nicola Amoroso, Nicola Gambacorta, Fabrizio Mastrolorito, Maria Vittoria Togo, Daniela Trisciuzzi, Cosimo Damiano Altomare & Orazio Nicolotti
Istituto Nazionale di Fisica Nucleare, Sezione di Bari, via E. Orabona, 4, 70125, Bari, Italy

Nicola Amoroso, Alfonso Monaco & Ester Pantaleo
Division of Medical Genetics, Fondazione IRCCS-Casa Sollievo della Sofferenza, San Giovanni Rotondo (Foggia), Italy

Nicola Gambacorta
Dipartimento Interateneo di Fisica “M. Merlin”, Università degli studi di Bari Aldo Moro, Via Giovanni Amendola, 173, 70125, Bari, Italy

Alfonso Monaco & Ester Pantaleo
Dipartimento di Chimica, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy

Fulvio Ciriaco

Authors

Nicola Amoroso

View author publications

You can also search for this author in
PubMed Google Scholar
Nicola Gambacorta

View author publications

You can also search for this author in
PubMed Google Scholar
Fabrizio Mastrolorito

View author publications

You can also search for this author in
PubMed Google Scholar
Maria Vittoria Togo

View author publications

You can also search for this author in
PubMed Google Scholar
Daniela Trisciuzzi

View author publications

You can also search for this author in
PubMed Google Scholar
Alfonso Monaco

View author publications

You can also search for this author in
PubMed Google Scholar
Ester Pantaleo

View author publications

You can also search for this author in
PubMed Google Scholar
Cosimo Damiano Altomare

View author publications

You can also search for this author in
PubMed Google Scholar
Fulvio Ciriaco

View author publications

You can also search for this author in
PubMed Google Scholar
Orazio Nicolotti

View author publications

You can also search for this author in
PubMed Google Scholar

Contributions

N.A. conceived the study, carried out the analyses and wrote the main manuscript. F.M. and M.V.T. carried out the analyses and wrote the main manuscript. O.N. supervised the analysis and wrote the main manuscript. F.C. carried out the analyses. All the authors reviewed and approved the manuscript.

Corresponding authors

Correspondence to
Nicola Amoroso or Fulvio Ciriaco.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Figure S1.

Supplementary Table S1.

Supplementary Table S2.

Supplementary Table S3.

Supplementary Legends.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Cite this article

Amoroso, N., Gambacorta, N., Mastrolorito, F. et al. Making sense of chemical space network shows signs of criticality.
Sci Rep 13, 21335 (2023). https://doi.org/10.1038/s41598-023-48107-3

Download citation

Received: 13 September 2023
Accepted: 22 November 2023
Published: 04 December 2023
DOI: https://doi.org/10.1038/s41598-023-48107-3

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Daily News

Making sense of chemical space network shows signs of criticality

Abstract

Introduction