Background & Summary

Salmonella is a significant pathogen threatening the global public health1,2,3. Many Salmonella serovars can infect various hosts. Salmonella enterica serovar Gallinarum (S. Gallinarum), an avian-restricted pathogen, causes pullorum disease (PD) and fowl typhoid (FT), resulting in high mortality rates and disproportionately affecting the poultry industry in low- and middle-income countries4. S. Gallinarum comprises three common biovars: S. Gallinarum biovar Gallinarum (bvSG), S. Gallinarum biovar Pullorum (bvSP), and S. Gallinarum biovar Duisburg (bvSD)5. Specifically, bvSP causes PD through both horizontal and vertical transmission and is prevalent in many countriess affecting chickens and turkeys6,7. Persistent bvSP infection in young birds within spleen macrophages complicates decontamination efforts. In contrast, bvSG can infect birds of any age, leading to fatal FT disease8. While S. Gallinarum was globally prevalent in the 20th century, eradication programs in most developed countries have significantly reduced its presence9. However, S. Gallinarum remains prevalent in developing countries like China and Brazil due to the diverse poultry industry and breeding patterns, severely constraining the healthy development of the breeding chicken industry10.

Antimicrobial therapy remains the primary treatment for S. Gallinarum infections11. However, the misuse or overuse of antimicrobials has led to the increasingly frequent issue of antibiotic resistance (AMR)12,13,14. Previous studies have shown that S. Gallinarum has developed significant resistance to fluoroquinolones, a class of first-line drugs used for treatment, such as enrofloxacin and ofloxacin15,16. Increasing strains exhibit multi-drug resistance (MDR), which jeopardizes the efficacy of single antibiotic treatments17,18. Mobile genomic elements (MGEs) are key drivers in the acquisition of antimicrobial resistance genes (ARGs), which contribute to antimicrobial resistance in S. Gallinaru19. For example, class 1 integrons have been demonstrated to facilitate AMR acquisition in S. Gallinarum in both in vivo and in vitro experiments20. Plasmids such as IncN, IncX1, and IncQ1, have also been observed among antimicrobial-resistant S. Gallinarum21. However, the role of other MGEs, such as prophages and transposons, in acquiring resistance remains uncertain. Additionally, it is unclear whether other unproven genes contribute to the elevated levels of resistance observed.

With the advancement of whole-genome sequencing (WGS) technology, researchers can now monitor and trace phylogenetic relationships among S. Gallinarum populations at the single-base level, further elucidating the genetic mechanisms underlying epidemiologically important phenotypes like antimicrobial resistance22,23,24. In this study, we collected the most comprehensive and well-annotated genomic dataset of 574 S. Gallinarum isolates, covering the period from 1920 to 2023. We found that S. Gallinarum, with distinct properties, exhibits variations in the types of ARGs, MGEs, and potential functional genes it possesses. We believe the comprehensive S. Gallinarum genome resources provided in this study will significantly aid in the surveillance and prevention of pullorum disease.

Methods

Ethical considerations

The protocols used in this study were approved by the Committee of the Laboratory Animal Center of Zhejiang University (ZJU20190094; ZJU20220295).

Sample and metadata collection

The collection of S. Gallinarum genomes in this study was obtained from publicly accessible datasets and laboratory sources. Firstly, a PubMed (“https://pubmed.ncbi.nlm.nih.gov”) search using the terms “Salmonella Gallinarum” and “Genome” yielded 42 publications on the epidemiology of the S. Gallinarum genome over the past decade. After identifying and excluding one duplicate report, we included 41 unique studies in our analysis. Of these, 22 contained publicly available S. Gallinarum genomic data (Fig. 1a). Next, we used the Enterobase25 (“https://enterobase.warwick.ac.uk”), a comprehensive online resource for bacterial genomics, to search for genome data of all Salmonella serovars labelled as “Gallinarum”. Finally, we combined and streamlined the genome data obtained from both sources, resulting in a dataset comprising 532 S. Gallinarum strains. Notably, 325 of these genomes were sourced from public datasets and had been previously studied in our laboratory. The metadata also includes the “Isolate continent”, “Isolate country”, “Isolate province or state”, and “Isolate year” for each strain collected. Additionally, 42 recently isolated S. Gallinarum strains from deceased chicken embryos in Zhejiang Province, China, were added to the preliminary dataset, bringing the total number of S. Gallinarum genomes to 574.

Fig. 1
figure 1

Overall study workflow. (a) Analysis process for the systematic literature review. (b) Processing pipeline for genomic data.

Full size image

S. Gallinarum isolation and DNA extraction

A total of 734 dead chicken embryo samples were collected from Taishun and Yueqing in Zhejiang Province, China. After thorough autopsies, the liver, intestines, and spleen were extracted and placed individually into 2 mL centrifuge tubes, which contained 1 mL PBS. The organs were then homogenized by grinding. During the initial enrichment phase, we used Buffered Peptone Water (BPW, Haibo Biotechnology Co, Qingdao, China) at a 1:9 dilution ratio (sample in PBS: BPW) and incubated the mixture at 37 °C for 16–18 hours in a rotary incubator set to 180 rpm. For selective enrichment, Tetrathionate Broth Base (TTB, Land Bridge Biotechnology Co, Beijing, China), enhanced with iodine solution and brilliant green solution (both from Land Bridge Biotechnology Co, Beijing, China), was used at a 1:10 ratio (sample in BPW: TTB). The mixture was then incubated at 42 °C for 22–26 hours in a rotary incubator set at 180 rpm. Isolated Salmonella colonies from positive samples were obtained by sub-culturing the selectively enriched samples on Xylose Lysine Deoxycholate (XLD, Land Bridge Technology Co, Beijing, China) agar, followed by an 18–22-hour incubation at 37 °C. Typical and pure colonies were selected after sub-culturing on XLD agar and then transferred into Luria-Bertani (LB) broth. Finally, the transferred bacterial culture was further incubated at 37 °C for 18–22 hours in a rotary incubator set to 180 rpm.

DNA extraction and genomic assembly

The DNA was extracted using the Vazyme Fastpure® Bacteria DNA Isolation Mini Kit (Vazyme Biotech Co., Ltd.) and quantified using the NanoDrop1000 system (Thermo Fisher Scientific, USA). DNA libraries were subsequently constructed and sequenced on the Illumina Novaseq. 6000 platform (Novogene, Beijing, China). The quality of the sequencing reads was assessed using FastQC v0.74, and any joint or low-quality reads were removed with Trimmomatic26 v0.39. Genome sequences were assembled by SPAdes27 v3.12.0 with default parameters. We obtained 45 S. Gallinarum genomic sequences from 734 dead chicken embryo samples (isolation rate = 6.1%), but three were duplicates. After removing duplicates, we finally retained 42 unique S. Gallinarum genome sequences.

Genome quality control

All WGS data passed strict quality control according to criteria set by the European Reference Laboratory28. Python3 script was used to assess genome quality with genomics exceeding 500 contigs, and an N50 of less than 30,000 was excluded. Moreover, we considered GC% within the range of 51% to 53% and a file genome length between 4.2 and 5.5 MB. Busco29 v5.5.0 was used to assess genomic completeness with default parameters. Lastly, the bacterial species and S. Gallinarum biovar type were confirmed by KmerFinder v3.2 and SISTR30 v1.1.1. Finally, the assembled dataset includes a total of 574 high-quality sequences.

Genome annotation and Pan-genome analysis

The assembled genomes were annotated using the Bakta31 v1.9.2 with the latest “Full” dataset. Further, the minimum contig size (–min-contig-length) was set to 200, and “–keep-contig-headers” was set to “Ture”. The output of Bakta was then used to calculate the Pan-genome matrix using Roary32 v3.13.0. The results were visualized using Python3 scripts provided by Roary with default parameters. The 7-gene legacy Multilocus Sequence Typing (MLST) was conducted using the MLST software with the “senterica_achtman_2” scheme.

MGE and ARG detecting

We identified four types of MGEs: plasmids, transposons, integrons, and prophages. MOB-suite33 v3.0.3 was used to reconstruct and classify Plasmids. Specifically, we extracted plasmid sequences from assembled fasta files by MOB-Recon, and the plasmid replicon type and mobility were predicted by MOB-Typer. Integrons and transposons in the genomes were detected with BacAnt34 v3.4.0, only focusing on those with a similarity of more than 60% and a coverage greater than 60%. The prophages were identified through the Phaster pipeline35. The genomic data were split into two temporary datasets based on contigs number: one dataset for single contig files and the other for multiple contig files imported into the Phaster pipeline separately with default parameters. The ARGs were detected by ResFinder, with minimal identity and coverage thresholds set to 90%.

Figure 1b illustrates the overall genomic analysis pipeline of this study.

Data Records

  • Publication information: The list of authors, titles, journals, and Digital Object Identifiers (DOIs) for each paper allows authors or subsequent dataset users to track and validate data sources (Supplementary Table 1). Literature serial numbers have no practical significance and are only used for arbitrary sorting and counting functions.

  • Genome meta information: The names, quality information, biovar type, isolation source, isolation time, and sequence type (ST) of all genomes in this dataset are provided in Supplementary Tables 2, 3. Each row represents an individual record, while each column corresponds to a variable, as detailed below:

  1. 1)

    GC%: The percentage of nucleobases in a DNA molecule that are either guanine (G) or cytosine (C).

  2. 2)

    N50: The length of the shortest contig for which contigs of equal or longer length cover at least 50% of the assembled genome data.

  3. 3)

    Total length(bp): The number of nucleobases in a DNA molecule.

  4. 4)

    BUSCO: These results correspond to the genomic completeness assessment, which consists of four components: Complete (C) and single-copy (S), Complete and duplicated (D), Fragmented (F), and Missing (M).

  5. 5)

    Biovar: Biovar of S. Gallinarum.

  6. 6)

    Continent: The sampling site information at the continent level.

  7. 7)

    Country: The sampling site information at the country level.

  8. 8)

    Province or State: The sampling site information at the province or state level.

  9. 9)

    Year: Sampling year.

  10. 10)

    ST: The sequence type of S. Gallinarum.

  11. 11)

    Data source: This indicates whether the strain originated from our laboratory.

  • The ARGs records are provided in Supplementary Table 4, while the MGEs records are included in Supplementary Tables 5, 6. The annotated files produced by “Bakta” and the pan-genome results have been uploaded to Figshare36 at: https://doi.org/10.6084/m9.figshare.26054251.v1.

  • Genome dataset: The dataset for the newly isolated 42 strains of Salmonella Gallinarum is available in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA114371337,38. The “BioSample ID” and “Accession number” for each strain are listed in Supplementary Table 7. The dataset for the 532 Salmonella Gallinarum genomes, including publicly available data from our lab and other sources, is available on Figshare36 at: https://doi.org/10.6084/m9.figshare.26054251.v1.

  • The Sequence files for the 1,733 plasmids from 574 S. Gallinarum are available on Figshare36 at: https://doi.org/10.6084/m9.figshare.26054251.v1.

Technical Validation

Genome quality

The genomic data of the 574 S. Gallinarum in the dataset underwent rigorous quality control. All genomic data were predicted by KmerFinder and SISTR, confirming S. Gallinarum identity, with N50 values exceeding 30,000 (average = 529,910). The genome lengths of the 574 strains ranged from 4,511,649 bp to 5,411,605 bp (average = 4,800,117 bp), and the GC content ranged from 51.25% to 53.15% (average = 52.12%). The anticipated genome size ensured minimal genomic contamination (Fig. 2a–c). Additionally, Busco results indicated that the genomic completeness of the dataset exceeded 98.5%, ensuring the reliability of the data for future analyses (Fig. 2d).

Fig. 2
figure 2

Genomic quality control and geo-temporal distribution of Chinese source Salmonella enterica serovar Gallinarum (S. Gallinarum). (ac) Quality control of the assembled genome. Where a-c represents GC%, N50, and genome length, respectively. (d) BUSCO analysis showed 98.5% gene completeness in genomes of all three strains, with only 1.4% missing gene orthologs. (e) Geographical distribution of S. Gallinarum in China. The colour indicates the number of isolated bacterial within the province. (f) Temporal distribution of S. Gallinarum in China. Larger circles represent more S. Gallinarum. (g) Correlation analysis was conducted between the number of S. Gallinarum isolates collected from different provinces in China and each province’s total GDP, population size, number of poultry slaughtered, and egg production. The figure depicts points representing individual provinces. The x-axis indicates the number of S. Gallinarum isolates included in the dataset, while the y-axis displays the values for each province’s total GDP, population size, number of poultry slaughtered, and egg production, respectively.

Full size image

Geographic distribution

In our dataset, most S. Gallinarum isolates were from Asia (435/574), with Europe (61/574), South America (30/574), North America (14/574), and Africa (7/574). For Asia, China is the primary source of S. Gallinarum, with most cases concentrated in the eastern region (Fig. 2e). Regarding the duration of isolation, our dataset indicates a notable prevalence of S. Gallinarum in China after 2010 (Fig. 2f). To determine whether the number of isolates in China is biased, we conducted a correlation analysis between the number of S. Gallinarum isolates from different provinces and the provinces’ latest GDP and total population, respectively. The results indicate that most points fall within the 95% confidence interval of the regression line. Although some points exhibit a bias in the number of S. Gallinarum strains, most of these points have a low sample size (n < 15) (Fig. 2g). Consequently, we hypothesize that the geographic distribution of S. Gallinarum in different provinces of China mirrors the overall national trend. To further eliminate potential bias, conducting larger sampling and increasing the number of available genomes is important.

Abundance of ARGs and MGEs

The dataset includes 635 records on ARGs and 5,706 records on MGEs. For ARGs, 617 records from Asia, 12 from the Americas, and 6 from Europe were identified. Interestingly, we observed a significant increase in the proportion of recently isolated S. Gallinarum (after 2005) carrying beta-lactam and sulfonamide ARGs, while the corresponding proportion of aminoglycoside and trimethoprim ARGs decreased. As the region with the highest number of ARG records, our dataset revealed that beta-lactam and sulfonamide ARGs were prevalent in most provinces of China. In contrast, trimethoprim ARGs were scarce (Fig. 3a).

Fig. 3
figure 3

The abundance records of ARGs and MGEs in the dataset. (a) The proportion of abundance records for ARG and MGE in each continent, each year, and each province of China. (b) Predominant replicon types of 1733 plasmid carried by Salmonella enterica serovar Gallinarum. (c) Prediction of mobility of 1733 plasmids using MOB-Typer (Conjugative, Mobilizable, Non-mobilizable).

Full size image

For MGEs, the records of prophage and plasmid were the highest. Specifically, there were 3,667 records for prophages and 1,733 records for plasmids. Prophages were carried by almost all S. Gallinarum, and plasmid carriage was also high at 97% (557/574). Typing of plasmids further confirmed that IncFII and ColpVC were the most common plasmid types in S. Gallinarum (Fig. 3b). Notably, 53.8% (932/1,733) of the plasmids were predicted to be mobilizable (Fig. 3c).

Pan-genome analysis

Using the combination of Bakta and Roary as a workflow to analyze the pan-genome, we identified the core, soft-core, shell, and cloud genes carried by S. Gallinarum. From a biological significance perspective, the core and soft-core genes in most S. Gallinarum strains encode proteins essential for fundamental biological processes. They could be used in outbreak detection purpose and genomic surveillance39. Non-core genes, such as shell and cloud genes, comprise MGEs and reflect horizontal gene transfer (HGT) between S. Gallinarum strains40. These genes also endow S. Gallinarum with specialized functions, enabling them to thrive in unique environments. We observed a total of 12,461 genes in 574 S. Gallinarum strains. These include core genes (n = 3,604), soft-core genes (n = 467), shell genes (n = 893), and cloud genes (n = 7,497) (Fig. 4a), which might indicate a high frequency of HGT among S. Gallinarum. Furthermore, we compared the unique genes in a single S. Gallinarum biovar type. We found that bvSP has the highest number of unique genes, with 5,941, followed by bvSG (n = 768) and bvSD (n = 120) (Supplementary Table 8).

Fig. 4
figure 4

Results of pan-genomic analysis. (a) Presence of different genes in the pan-genome analysis. The total number of genes is 12,461, which contains the Core gene (n = 3,604), Soft-core gene (n = 467), Shell gene (n = 893), and Cloud gene (n = 7,497). (b) The Salmonella enterica serovar Gallinarum phylogenetic tree constructed based on the Pan-genome is shown on the left. The colours in the heat map on the right indicate the presence or absence of genes, with dark blue meaning gene present and blank meaning absent.

Full size image

Usage Notes

The data are shared under Creative Commons Attribution 4.0 International (CC BY 4.0). The full text of the license is available at: https://creativecommons.org/licenses/by/4.0/.