Introduction

The canonical human proteome represents a universally recognized and comprehensive set of proteins encoded by the human genome, and its establishment has been critical to understanding fundamental cellular processes1. However, traditional annotations of this canonical proteome often overlook the polycistronic nature of genes and the factors encoded by non-canonical open reading frames (ORFs) that exist within and outside of the primarily recognized protein-coding genome2,3. Indeed, the combination of many non-ORF genomic features, including microproteins encoded within and adjacent to larger canonical protein sequences, non-coding RNA and DNA, and products of alternative splicing, all contribute to the emerging concept of the ‘hidden genome’ (Fig. 1). For example, in human cell lines, many unique proteins with differing functions can be encoded within a single gene sequence2,4. This growing appreciation of the nested and entangled nature of ORFs in the translatome has demonstrated that our traditional understanding of the functional potential within cells has been historically underestimated5. Our understanding of cellular function is further complicated by the classification of features such as long non-coding RNAs, which can function as RNAs, though, other times, actually do encode proteins critical to cell function6,7. Non-coding DNA has become of particular interest since genome-wide association studies suggest that over 90% of human disease-associated DNA variants are found in the non-coding genome8. Certainly, defining the canonical human proteome appears to have been just the beginning of characterizing the many intricate genetic products that contribute to cellular fitness in the human cell.

Fig. 1: Functional products of DNA expression, including components of the hidden genome.
figure 1

Sources of function from DNA include: production of mRNA and the resulting canonical proteins that are the focus of most genomic research; alternatively spliced RNA and protein products, including introns that are capable of becoming fixed in a cell and regulating gene expression; ubiquitous transcription of diverse non-coding RNAs that play important roles via their interaction with DNA, RNA, and proteins; the expression of pseudogenized DNA and dubious ORFs into functional proteins; the existence of non-coding regulatory elements (NCREs) that regulate gene expression; transposable elements that may move within the genome or between cells within or outside of the originating species, along with other repetitive DNA that may facilitate structural rearrangements in the genome. Figure 1, created with BioRender.com, released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.

Full size image

This phenomenon of the hidden genome extends across the tree of life and includes fungal species, where non-ORF genes that contribute to fitness have traditionally been neglected and are only recently beginning to be identified and characterized9,10. Indeed, the genomes of many fungal species important to human health and disease, agriculture, and biotechnology, remain incompletely characterized11. There is substantial emerging evidence for a critical role of the hidden genome, including microproteins and alternatively spliced proteins, microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs), as well as other regions of non-coding and repetitive DNA, in numerous key facets of fungal biology (Table 1). Therefore, the ability to fully characterize these cryptic components of the fungal genome will be a critical step towards a comprehensive understanding of the genomics and biology of the fungal kingdom. The fact that genomic tools generally take longer to adapt and utilize in fungi has meant that many components of the hidden genome have gone relatively underexplored compared to other model species, though advances in other model organisms do offer a glimpse into how new tools may be employed in fungi. As the vast diversity of the fungal kingdom plays significant roles in all aspects of human life, it is imperative to fully dissect the functional potential of these fungal genomes.

Table 1 Selected ‘hidden genome’ elements and their corresponding studies that are referenced by this Review, by species
Full size table

Here, we describe the emergent findings that have been made into the hidden genome of diverse fungal species. We describe key components of the genome outside the canonical proteome, using both fungal and other species as relevant context, describe the important role of these factors in mediating important facets of fungal biology, and discuss new technologies that may be adapted to study the hidden genome in fungal species.

Non-canonical ORFs

Microproteins

Microproteins are proteins made up of a polypeptide chain shorter than 100 amino acids in length, encoded by short open reading frames (sORFs)12. These proteins were originally omitted from most functional analysis research due to the expectation that they would only rarely impact fitness, and for practical reasons due to their massive abundance in the genome12. However, the use of newer, more sensitive technologies has allowed for the confident detection of thousands of microproteins translated from sORFs in human cells, many of which have been characterized as playing important roles, including their involvement in stress response pathways13. Foundational studies in characterizing microprotein function in human cell lines have revealed the potentially profound impact of the microproteome on the development of disease14. In bacteria, many microproteins have been implicated in drug resistance, as well as in toxin-antitoxin systems and oxidative stress15.

Microproteins encoded by sORFs have also been recognized as having putative regulatory functions in fungi for decades, though, by their nature, it is difficult to distinguish between sORFs that are transcribed and translated and those randomly occurring incidental small ORFs that are not16. Regardless, bioinformatic approaches for annotating genomic sORFs have improved substantially over the past few years, and have been used to predict thousands of sORFs in 31 different fungal genomes17. One study in Saccharomyces cerevisiae leveraged ribosome profiling datasets to determine that translation of non-canonical ORFs may occur within the DNA sequences of at least 15% of canonical ORF-encoding genes18. Hypothetical annotations such as these can serve as the foundation for experimental approaches while attempting to successfully identify microproteins. One combinatorial method using ribosome profiling and proteomics, for example, was used in the fission yeast Schizosaccharomyces pombe to verify the existence of peptides corresponding to hypothesized sORFs19. However, in this case, only 9/373 of the presumed sORFs had a detectable peptide, implying that new technologies will be required for the reliable detection of microproteins in fungi19. Despite the technological limitations faced by fungal researchers, several studies have showcased the function of microproteins across different growth conditions. In one investigation, researchers were able to identify the microprotein Nrs1 from a genome-wide overexpression screen in S. cerevisiae, whose upregulation rescued an otherwise inviable double gene-deletion mutant20. Nrs1 itself allows cells to overcome nitrogen-starved conditions and plays an important role in the regulatory circuitry involved in yeast budding20. Several other singular instances of microproteins serving as key players in regulatory pathways have been described, mostly from research on S. cerevisiae16. In another study in S. cerevisiae, researchers were able to identify 225 microproteins and plot how they are differentially- and sometimes, exclusively- expressed in response to UV stress, heat shock, and nutrient limitations, suggesting critical functions in different cellular adaptation contexts21.

There is therefore a strong implication that improved tools to investigate the fungal microproteome would result in numerous insights into stress response pathways and diverse aspects of fungal biology. While ribosome profiling and mass spectrometry-based proteomics may be adapted in other fungal species, these strategies could also be augmented by combining CRISPR screening and single-cell RNA sequencing to allow for the function of microproteins in fungi to be determined at scale, including the genome-wide effects of their perturbance on gene expression22. Similarly, improved models for the in silico prediction of microproteins that are not limited to one coding sequence per transcript, and that can predict sORFs and ORFs translated from non-AUG start codons, could now be applied to fungal species23. In addition, the optimization of new RNA-targeting CRISPR-based tools for fungi may allow for the discriminate targeting of sORF mRNA and characterization of microprotein function24.

Alternatively spliced proteins

Similar to microprotein formation, the capacity to include and exclude different sets of exons from a single gene through alternative splicing is also known to massively increase mRNA and protein diversity in eukaryotes25. Alternative splicing events have been demonstrated to play an important role in many facets of cell biology, including the establishment of drug resistance in human cancer cells and the pathogenesis of microbial parasites26,27. While alternative splicing is generally regarded as being less relevant in bacteria due to introns being either absent or very rare in prokaryotic species, splicing events are distinct in eukaryotes, where the number of genes that contain introns in fungi, for example, ranges massively from 4% to 99% across different species28. Despite this, alternative splicing was long seen as inconsequential to the cell in fungi, perceived as resulting in transcripts with either redundant or inoperative functionality29.

Recently, however, numerous mechanistic effects of alternative splicing have been established in fungi with impacts on growth, stress adaptation, infection, and immune recognition30. In the plant-parasitic fungus Shiraia bambusicola and the filamentous fungi Neurospora crassa and Aspergillus nidulans, alternative splicing has been observed to increase proteomic complexity and downstream functionality, particularly in response to different environmental stressors31,32,33. In the mushroom-forming Schizophyllum commune, alternative splicing was found to increase the total number of transcripts in the cell by 20%, and the majority of spliced transcripts were predicted to have alternative functions based on the fact that 70% of them had either lost or gained a functional domain compared to the non-spliced isoform29. In many yeast species, the rate of alternative splicing can be altered in response to stress, and further research into the molecular basis for the augmentation of genomic complexity from alternative splicing has demonstrated the capacity for dual localization of proteins encoded by the same gene to serve distinct functions in metabolism9,34,35. One study created a deletion set of all known introns in S. cerevisiae and identified the important roles introns have during competition and nutrient starvation36. Interestingly, in S. cerevisiae, once spliced out, introns themselves can become stably fixed in a cell and can act on different metabolic pathways, thus acting as regulatory ncRNAs37. In addition, expression via alternative transcriptional start sites and antisense transcription can be triggered in response to environmental cues in many fungi, including Metarhizium robertsii, A. nidulans, and Cryptococcus species, often resulting in changes in protein localization and downstream regulatory effects on gene expression38,39,40.

Alternative splicing may also be broadly associated with fungal pathogenicity, as it appears to be more prevalent in fungal pathogens, especially human fungal pathogens, than non-pathogenic taxa41. In the rice blast pathogen Magnaporthe oryzae, for example, deletion of the MoGrp1 protein involved in different splicing processes leads to a dramatic decrease in the virulence of the pathogen42. It has further been suggested that alternative splicing is potentially linked to drug susceptibility phenotypes in pathogenic fungi, and could even act as a target for the development of new therapeutics43. Indeed, alternative splicing seems to be differentially regulated specifically in response to certain antifungal drugs in N. crassa44. Another example involves the human pathogen Candida albicans with the oxidative stress-generating drug menadione45. In this study, researchers identified a case of differential drug resistance following deletion of the superoxide dismutase gene SOD3, where only overexpression of the spliced isoform of SOD3 rescued the mutant’s increased susceptibility to menadione45. As expected, menadione’s antifungal effect is partly based on its inhibition of the cell’s ability to perform alternative splicing45. While alternative splicing of introns clearly has underexplored and important roles in metabolism, antifungal drug resistance, and pathogenicity, the splicing of inteins — intervening sequences that are spliced out at the protein level — is also emerging as a process important to fungal biology and antifungal drug susceptibility. In the human pathogenic yeast Cryptococcus neoformans, different antifungal agents have been found to prevent the Prp8 intein from performing its essential role in cell viability and virulence via intein splicing, which has opened up new avenues for potential therapeutics against pathogenic fungi46,47.

Coinciding with the advent of long-read RNA sequencing technologies, new computational tools that allow for the sensitive detection of alternatively spliced transcripts have emerged in the past few years48. As these platforms continue to be improved upon, they may be adapted for the detection of mRNA isoforms in fungal species at a genome-wide level48. Further, adapting CRISPR-based platforms that allow for targeted deletion of single exons49, as well as multiplexed repression and activation of exons50, may allow fungal researchers to investigate the functional differences of spliced mRNA isoforms.

Non-coding RNAs

Non-coding RNAs (ncRNAs) include any RNA molecule in the cell that is generally not translated into a protein. While some classes of ncRNAs have long been acknowledged for playing crucial roles in the cell, including ribosomal RNA (rRNA) and transfer RNA (tRNA), the majority of ncRNA molecules were historically overlooked as being inert by-products of transcription51. However, more recent large-scale sequencing efforts have found that the majority of the genome can be transcribed, mostly into ncRNAs, and many divergent families of ncRNAs have been recognized for their unique and critical functions51. Among the many different classes of ncRNAs, regulatory RNAs include lncRNAs, miRNAs, and circRNAs. LncRNAs are ncRNAs longer than 200 nucleotides (nt), miRNAs have a length of around 19-25nt, and circRNAs differ in being non-linear and having a closed-loop structure. Each of these three types have different mechanisms of action, though in many cases exert epigenetic, RNA processing, and translational regulatory actions in the cell52,53,54. In human cells, all three of these regulatory ncRNA classes have been well studied in contributing to disease55,56. Despite their relatively simpler genomes, bacterial pathogens also utilize regulatory ncRNAs in diverse ways, some of which have been proposed as potential drug targets57. Regulatory ncRNAs, primarily lncRNAs, miRNAs, and circRNAs have all been identified in a myriad of disparate fungal taxa, though the extent to which these ncRNAs have been functionally characterized differ.

Long non-coding RNAs

Perhaps at the forefront of these inquiries are lncRNAs. In many fungi, lncRNA-encoding DNA can exist within and be transcribed from intergenic, intronic, sense, and antisense regions (Fig. 2)58. lncRNAs can be especially difficult to identify and characterize, as, unlike protein-coding genes, they seem to lack sequence conservation in mammalian cell lines, for example59. Despite this, lncRNAs have been described in many different fungal cellular processes including gene silencing and regulation, nutrient metabolism, histone modification, drug resistance, and virulence58,60. In fungi, lncRNAs have been most comprehensively studied in model yeasts. In S. cerevisiae, differential lncRNA abundance in distinct cell subpopulations has been suggested to play a regulatory role in cell and colony development61. Efforts have been made to characterize ncRNAs on a large scale in S. cerevisiae, where barcoded ncRNA deletion libraries have been screened to identify several intergenic ncRNAs essential to cell survival62. Another relatively large-scale approach taken in S. cerevisiae to explore lncRNA biology involved knocking out non-essential ORFs in combination with lncRNAs to generate double deletion mutant cells, allowing the characterization of lncRNAs via genetic interaction analysis63. This resulted in the identification of one lncRNA that acts in trans to regulate levels of distant telomeric single-stranded DNA, which is a necessary component of telomeric replication63. The researchers were, however, also able to implicate many other lncRNAs operating in a wide range of biological processes in the cell63. Indeed, there are several other well-defined cases of lncRNAs influencing the cell in yeast, including the specific regulatory roles that lncRNAs have on cell wall-related gene expression64. In the model fission yeast S. pombe, thousands of lncRNAs have been detected and have had their expressions monitored in response to different perturbations65. Amongst the lncRNAs in S. pombe, transcription of the well-characterized nc-tgp1 was found to play an important role in sensitivity to the drug thiabendazole as well as to hydroxyurea and caffeine by simply increasing nucleosome density, thereby preventing transcription of the neighboring tgp1 ORF (Fig. 2)66. lncRNA biology is also quickly expanding into other fungal taxa, including the yeast Pichia pastoris and the filamentous fungi N. crassa, where a vast amount of lncRNAs have been identified and assigned putative functions67,68.

Fig. 2: The nature of lncRNA transcription and examples of intergenic lncRNAs in fungi.
figure 2

LncRNAs exist in several different orientations relative to canonical ORFs within fungal genomes, including in between exons (intronic), in between genes (intergenic), as well as within genes in either the same (sense) or opposite (antisense) direction as the larger ORF. Many lncRNAs have been characterized in diverse fungal species, including: the lncRNA DINOR in Candida auris that affects fungal morphology and drug resistance; the putatively cis-acting lncRNA sequence RZE1 in Cryptococcus neoformans that recapitulates virulence phenotypes of the neighboring gene ZNF2 when deleted and appears to affect the proportion of mRNA of ZNF2 that is localized to the nucleus; and the lncRNA nc-tgp1 that alters sensitivity to environmental stressors in Schizosaccharomyces pombe by increasing nucleosome occupancy when actively transcribed, excluding the transcription factor Pho7 from binding and activating expression of the downstream tgp1 gene. Figure 2, created with BioRender.com, released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.

Full size image

The field of fungal lncRNAs has also had a recent surge of interest in the context of pathogenic fungi. In many pathogenic Candida species, new bioinformatic and transcriptomic approaches continue to unveil more and more lncRNA transcripts in the cell69,70,71,72. Further, certain lncRNAs have been validated to exhibit significance in biological processes. One of the most striking examples involved a showcasing of the lncRNA named DINOR in the rapidly emerging multidrug-resistant human pathogen Candida auris, and its critical role in governing fungal pathogenicity, filamentous growth, and antifungal drug resistance (Fig. 2)73. DINOR was discovered via screening a genome-wide transposon mutagenesis library, therefore also inadvertently presenting a strong argument for constructing and screening more comprehensive mutant libraries that are not limited to canonical proteins73. On a larger scale, the differential expression of hundreds of lncRNAs cataloged across several important pathogenic Candida spp. during infection has also been characterized74.

The relevance of lncRNAs to virulence and metabolism is becoming clear in other crucial human pathogens as well, as a random insertional mutagenesis screen in C. neoformans uncovered RZE1, an lncRNA essential to pathogenicity via its predominately cis-acting regulatory function of the yeast-to-hypha transition (Fig. 2)75. In the pathogen Aspergillus sydowii, lncRNAs were found to play a role in tolerance to high NaCl stress, and in the related pathogen Aspergillus flavus, expression profiling of hundreds of lncRNAs suggested they are expressed and differentially localized in response to important environmental triggers, such as changes in temperature, osmotic stress, and CO276,77. In other fungal pathogens, including those that primarily infect and parasitize plants and insects, there has also been a rapid increase in interest in lncRNA function, where reports on the lncRNA landscape in Fusarium graminearum, Nosema ceranae, M. robertsii, and Cordyceps militaris, have all produced valuable preliminary insight into their functions in the cell78,79,80,81. Our understanding of lncRNAs in fungi and the diverse aspects of fungal biology they impact is continuing to expand.

MicroRNAs and micro-like RNAs

Research into other types of regulatory ncRNAs, namely miRNAs and circRNAs, has also shown promise for our improved understanding of fungal biology. RNA interference (RNAi) is a conserved mechanism in eukaryotes that typically involves small non-coding RNAs (sRNAs) ~ 19–24 nucleotides in length82. sRNAs work in concert with effector proteins to form an RNA-induced silencing complex (RISC) that can together silence mRNA translation via complementary binding to target RNA by the sRNA82. One category of sRNA in these systems is miRNA, which differs from other types due to their relatively indiscriminate binding patterns82. miRNAs also have many regulatory roles in the cell, rather than being simply a genome defense mechanism against invading viruses and transposons, as is the case for other sRNAs82. miRNAs were first discovered in Caenorhabditis elegans in 1993 and were initially considered to be absent in fungi until they were discovered in N. crassa in 201083,84. Since then, miRNAs and miRNA-like RNAs (milRNAs), which do not meet certain criteria established in other eukaryotic taxa to be included as miRNAs, have been identified in a considerable number of fungi, including Penicillium marneffei, Aspergillus fumigatus, Trichoderma reesei, Metarhizium anisopliae, Trichophyton rubrum, Sclerotinia sclerotiorum, C. albicans, and more83,85,86,87,88,89,90.

Much of the research into miRNAs and milRNAs in fungal species has involved employing sequencing approaches to validate their presence, and then analyzing their differential expression patterns in response to different growth conditions. Strategies such as these have allowed researchers to putatively classify miRNA and milRNA activity in important fungal cellular processes including thermal dimorphism, defense against mycoviral infection, cellulase production, mycelial growth and conidiogenesis, and sclerotial development, and also to predict their target RNAs in some cases83,85,86,87,88,89. Research in C. neoformans identified and profiled miRNAs, and found that miRNA sequences align to genomically-encoded transposons and pseudogenes, suggesting a role of miRNAs in regulating transposable element activity and the cryptic expression of pseudogenes91. Another phenomenon involves the spontaneous acquisition of drug resistance in the human fungal pathogen Mucor circinelloides via its RNAi-mediated gene silencing during growth in the presence of the antifungal drug tacrolimus92. Other studies have adopted a wider approach, such as in one case which involved employing computational tools to predict milRNAs and their RNA targets across 13 different plant fungal pathogen species93. This work identified several milRNA targets within the genome of their respective host plants, confirming a role for fungal milRNAs in suppressing plant host defense genes during infection93. Other examples of sRNA/milRNA-mediated silencing of host immunity by the plant pathogens Valsa mali and Botrytis cinerea have also been demonstrated and mechanistically characterized94,95. Despite these findings, it remains difficult to identify the activities of milRNAs in many cases, and it has been suggested that they may have other roles in genetic regulation besides mRNA cleaving, including translational repression or even DNA methylation96. Potential functions for sRNAs/miRNAs can also be overlooked, such as in one case where functional sRNA discovery in C. albicans was long impeded due to the widely used reference strain being uniquely deficient in a functional RNAi pathway, while the majority of other C. albicans isolates employ RNAi and sRNAs in the repression of telomere-associated genes90. As mi/milRNA annotations and their proposed functions in fungi continue to be assessed and improved upon97, the diverse roles of miRNAs and milRNAs across the fungal kingdom will become increasingly understood.

Circular RNAs

Circular RNAs (circRNAs) are the product of backsplicing, where the acceptor site of an upstream exon is towards its 5’ end, and the donor site of a downstream exon is towards its 3’ end, such that the exonic RNA folds in on itself and circularizes54. CircRNAs encode diverse cellular products, though they also seem to execute specific actions in the cell without being translated, including the inhibition of both miRNAs and the translation and activities of proteins54. It is also apparent that the ratio of circRNAs to linear RNA molecules is a tightly regulated process that has implications for disease and aging in humans54,98. CircRNAs of this kind were identified in fungi in 2014 in S. cerevisiae and S. pombe, but further research on them has been extremely limited98. In the past few years, sequencing efforts have led to hundreds or thousands of circRNAs being annotated in the genomes of the fungal pathogens M. oryzae, Ascosphaera apis, N. ceranae, and Ganoderma lucidum99,100,101,102. The ability of these circRNAs to act as “sponges” by competitively binding miRNAs was investigated and confirmed in all of these instances99,100,101,102. Interestingly, the expression patterns of different circRNAs often seemed to depend massively on the cell type or developmental stage of the fungus, though there has been a lack of characterization of individual circRNAs in specific cellular processes99,102,103. While intergenic, exonic, and intronic circRNAs exist, the proportion of circRNAs in either of the three groups seems to differ between fungal species101,103. Parallel efforts were also made in the human pathogenic T. rubrum, where researchers highlighted that one of the 4254 circRNAs they identified, Tru_circ07138_001, seemed to be highly conserved in ten other dermatophytic species analyzed, as well as in the distantly related red junglefowl Gallus gallus and C. elegans, indicating a shared role of this circRNA across the tree of life103.

The growing body of research on fungal regulatory ncRNAs serves as a compelling revelation of the importance of the hidden genome in fungi. New computational platforms, some of which leverage machine learning, that can identify regulatory ncRNAs from RNA sequencing data may be applied in fungi104,105. In addition, a diverse set of CRISPR-based screening platforms have already been demonstrated in human cells to study lncRNAs, miRNAs, and circRNAs en masse106,107,108,109. Indeed, harnessing new technologies developed for use in other species, along with continually improving techniques for functional genomic analysis in fungal taxa110,111, may greatly expand our understanding of these ncRNAs and their role in important facets of fungal biology.

Regulatory, repetitive, and canonically non-functional DNA

While many categories of ncRNAs and non-canonical proteins are beginning to be more clearly defined, there still exists parts of the genome that are more cryptic and underexplored. This involves the capacity of DNA to regulate gene expression independently of any transcribed or translated product, namely via non-coding regulatory elements (NCREs) such as promoters, silencers and enhancers, and insulators, that all contribute to cis-regulatory gene expression regulation112. However, it also includes regions of DNA that have historically been assumed to be non-functional, including repetitive regions and transposons, as well as pseudogenes and dubious ORFs, much of which has often been referred to as ‘junk’ DNA. ‘Junk’ DNA, has traditionally referred to any DNA sequence that does not play a known role in any cellular process, and has long been assumed to represent the vast majority of the DNA in the genome112,113. However, what constituent non-coding DNA is actually non-functional remains a contested topic, especially when considering the C-value paradox, which involves the discrepancy between the expectation that more complex organisms should tend to have larger genomes, and the reality that they often do not114. While the Encyclopedia of DNA Elements (ENCODE) project proposed that, in fact, 80% of the human genome is linked to biochemically active processes, criticisms of this claim involve the fact that their definition of biochemical activity does not extend to what is actually functional in the human cell, and that much of the DNA in a given genome remains to be assigned a true function114,115. Regardless, there are still many clear and validated examples of regions of ‘junk’ DNA contributing to diverse cellular phenomena113. These include pseudogenes, highly repetitive DNA, and transposable elements113. These groups of DNA have already been shown to be potential regulators of protein-coding DNA in some eukaryotes, as well as been demonstrated as effective therapeutic targets in cancer cells, but they remain relatively poorly studied in fungi116,117.

Non-coding regulatory elements

The identification and precise mapping of NCREs varies widely between different fungal taxa and often focuses on trans-acting regulatory components like transcription factors118,119. However, there have been important discoveries made on the topic of NCREs in fungi, and many efforts have been made, particularly in recent years, to employ methods that would allow for the mapping and functional analysis of these cis-regulatory components in Saccharomyces species120,121,122. Related phenomena have been identified in S. cerevisiae, in particular, where the genomic regions that encode tRNA seem to act as chromatin insulators with strong implications in preventing gene repression and activation123,124. Systematically categorizing fungal genetic circuits and NCREs may help uncover novel regulatory mechanisms of fungal pathogenesis and allow researchers to better harness fungal metabolism for industrial applications and for the production of important metabolites. Attempting to characterize cis-regulatory DNA is difficult, in part due to the complex nature of the interactions between transcription factors and a dynamic 3D DNA landscape125. However, CRISPR screens hold promise for functionally characterizing NCREs at scale126. In addition, the combination of improved machine learning models with new technologies that can produce long synthetic DNA molecules may help us to investigate transcription factor binding in the context of a locus large enough to account for high-level chromatin architecture and DNA-DNA interactions125.

Transposable elements and repetitive DNA

Another underexplored component of the fungal genome involves transposable elements (TEs). TEs encompass an array of different DNA sequences, some of which possess genes that enable them to change position in the genome, causing disruptions in gene function or alterations of local gene expression, and in some cases transport ‘cargo’ genes that are not required for propagation of the TE and may be of benefit to the host127,128. Additionally, the repetitive nature of non-mobile TE sequences scattered throughout the genome can facilitate structural rearrangements and shape genomic architecture127. The proportion of the genome that is made up of TEs varies from ~0% to upwards of 30% in fungi127. While the extent of their regulatory roles in shaping fungal phenotypes remains contested, they are linked to the expression of genes and their evolution in fungi129. Indeed, in M. oryzae, the presence of TEs appears to increase genetic diversity in neighboring genes, which in turn drives host specialization of the pathogen130. Studies done on the fungal plant pathogen Zymoseptoria tritici have revealed that TE insertions can directly regulate melanin biosynthesis, and therefore fungal virulence, as well as multi-drug resistance via fungicide efflux upregulation131,132. TE insertions also drive adaptation during host infection and contribute to the acquisition of drug resistance in Cryptococcus (Fig. 3)133,134. Transposon mobility in these cases displays a temperature-dependent pattern, underscoring their influence on the cell’s ability to respond to environmental changes (Fig. 3)133,135. Bioinformatic-based findings in other fungal pathogens also suggest that genes under TE influence are often repressed and that there tends to be a correlation between TE prevalence in the genome and symbiotic tendencies127,136.

Fig. 3: The movement of transposable elements (TEs) within and between fungal genomes.
figure 3

In the human fungal pathogen Cryptococcus deneoformans, the mobility of several different TEs is stress-responsive, including during infection and at the elevated (host-relevant) temperature of 37 °C. Insertion of mobilized TEs in this pathogen also appears to be biased depending on the type of TE, and has been associated with a decrease in susceptibility to antifungal drugs. The accumulation of the Cnl1 TE at subtelomeric regions, for example, can result in additional copies of Cnl1 being driven towards other sites in the genome, which may influence drug response and virulence phenotypes. The rate of other genomic changes, including from single nucleotide polymorphisms (SNPs) and insertion/deletion (indel) mutations, remain unchanged at elevated temperatures, suggesting TE mobility is a primary driver of genomic change during heat stress. In the plant fungal pathogen Pyrenophora tritici-repentis, the toxhAT TE contains the ToxA gene which can induce cell death in susceptible wheat strains during infection. The toxhAT TE has been observed to move between isolates of the same species, as well as between different species entirely. While toxhAT has been observed to exist within larger Starship TEs, there is also evidence to suggest that it is able to mobilize from the genome independently. Figure 3, created with BioRender.com, released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.

Full size image

A role for fungal TEs in horizontal gene transfer has also recently emerged based on the identification of very large transposons named Starships, which can be hundreds of kilobases in size and carry genes important to fungal survival and adaptation137. Many different Starships have been identified with high sequence similarity in divergent fungal taxa, suggesting they are able to migrate between species independently of the fungal host’s cellular machinery138,139,140. For example, the Hephaestus Starship locus that confers heavy metal resistance to the emerging infectious mould Paecilomyces variotii appears to have been shared with the related species Paecilomyces lecythidis, as well as Penicillium fuscoglaucum138,141. The nested nature of these giant TEs also implies that they are highly diverse. The ToxA gene that confers virulence to fungal wheat pathogens has been transferred horizontally between several species, and appears to exist within a ~14 kb TE named ToxhAT (Fig. 3)128. However, the ToxhAT locus itself has been identified within two larger Starship elements named Horizon and Sanctuary (Fig. 3)142,143. Considering TEs and larger Starships may be involved in the regulation and active exchange of genes involved in stress adaptation and pathogenicity within and between fungal taxa, further research being done to mine fungal genomes for their presence will be imperative in functionally characterizing these genetic factors across the fungal kingdom144.

Analyzing large repetitive regions of DNA as a whole, some of which include both non-coding DNA and ORFs, can also be a strategy for broader genome characterization. In an attempt to understand the plastic and rapidly adapting nature of the C. albicans genome, one study found that all segmental aneuploidy events — which are a critical part of C. albicans pathogenesis and drug resistance145 — occurred at long repeat sequences (anywhere from 65–6499bp in length)146. Here, these long repeat regions of DNA seemed to be necessary for many of the adaptive traits that C. albicans can harness to survive in a diverse set of challenging environments146. This is not entirely surprising as it has been long understood that large repeat regions of this kind seem to be relatively more amenable to rapid evolution in filamentous fungal plant pathogens, and tend to harbor virulence genes147. Indeed, new telomere-to-telomere sequencing platforms that can accurately annotate highly repetitive DNA that have already been applied in some fungi and oomycetes is a promising avenue for accurately identifying transposons and functional repeat elements at a genome-wide level148,149.

Pseudogenes and Dubious ORFs

Another component of ‘junk’ DNA is pseudogenes. Pseudogenes are genes that share close sequence similarities with canonical ORFs but have a disruptive mutation that precludes their transcription or leads to products with either reduced or abolished function150. For over a decade, the idea of pseudogenes encoding important regulatory products and having divergent impacts on cellular fitness has been demonstrated, but this research has been predominantly focused on human cells116. However, pseudogenes have been identified in many different fungal species, and have been a valuable means of understanding evolution and loss of pathogenicity150,151,152. A landmark study in S. cerevisiae from over two decades ago found active expression of some pseudogenes and posited that some may have impacts on the fungal stress rseponse153. Despite the advancements being made in identifying functional pseudogenes, there remains much to be characterized about the impacts of pseudogenes in most fungal species153. However, comparative analysis of pseudogene profiles between closely related species may be used to help identify genes responsible for any divergent functional capacities between the species151. Further, methods for applying CRISPR-based tools to study pseudogenes have been outlined and could be applied in fungal species154.

While not ‘junk’ DNA per se, dubious ORFs also represent an interesting area of functional genomic research. Dubious ORFs are ORFs that were originally suspected to not encode for a functional product, often based on the criteria that they are not conserved in any related species from the same taxa and that there is no experimental evidence of a resulting gene product155. ORFs labeled as dubious also tend to overlap with microproteins and sORFs, since both have traditionally been considered to not generate anything useful to the cell, even if some are known to still be translated155,156. Despite this categorization, an evaluative study on the accuracy of dubious ORF classifications in S. cerevisiae showed that many of these ORFs, indeed, produced detectable transcripts and/or products of translation157. The implications of this study have become clear with the emergence of a few notable examples in S. cerevisiae where the specific deletion of dubious ORFs resulted in prominent phenotypes associated with protein burden regulation and mitochondrial DNA maintenance158,159. This notion may also extend into other fungal taxa as it was recently shown in the human pathogen C. albicans that many previously labeled dubious ORFs are actively transcribed and translated, and that the rates of which seem to be differentially regulated during C. albicans’ morphological transitions156.

It has also been demonstrated that non-coding DNA is positively selected for in certain fungi, and does not always represent genetic relics with simple coincidental influence on gene expression. One study done on the fungal plant pathogen genus Colletotrichum illustrated many instances of positive selection for non-coding DNA during infection, which they suggest implies a regulatory role of this non-coding DNA160. It has elsewhere been proposed that the persistence of non-coding DNA in fungi is due to its function in harboring regulatory ncRNAs, or its role in intragenic DNA methylation, like in other eukaryotes161,162.

Conclusions and future perspectives

Many facets of fungal biology remain underexplored. However, the recent discovery of novel components of the fungal hidden genome may help us develop a more holistic understanding of fungal genetics and genomics. New technologies, particularly CRISPR-based techniques, have proven to be powerful tools to characterize these hidden components of the genome in non-fungal cell systems8. Harnessing new platforms developed for functional genomics in fungi may similarly help characterize important processes regulated by the non-coding genome and products of non-canonical expression. Numerous iterations of CRISPR technologies with disparate mechanisms of action have been demonstrated in diverse fungal taxa, including those that may be amenable to specific interrogation of hidden genome components that are nested within larger canonical sequences via differential expression163,164,165, those with a sensitive resolution for editing very small sequences of DNA166, and those that can discriminately target RNA molecules167. Thus, as more of these technologies are applied in fungi, a more robust characterization of non-canonical fungal genomes will emerge.