Background and Summary

Sphingomonas spp. are Gram-negative, oxidase positive and non-fermentative rods1. One of the best known species of the genus is Sphingomonas paucimobilis as it was originally said to be the only species described in human infection1,2. It is a non-spore forming strictly aerobic, yellow-pigmented bacteria that can survive in low nutrient environment1,3. S. paucimobilis is naturally found in diverse environments such as soil and water and also has been shown to have a wide range of xenobiotic-biodegradative abilities4,5,6. Previous studies had shown its’ ability to degrade various types of hydrocarbons and pesticides, specifically chlorpyrifos7,8,9,10,11,12. It is also well recognized for its potential for biofilm formation13. Despite the potential role of this bacterium in bioremediation, there is a lack of complete genome in the public domain which will allow for the identification of genes involved in the biodegradation of chlorpyrifos, a widely used organophosphate.

General features of S. paucimobilis strain AIMST S2 are summarized in Table 1. S. paucimobilis strain AIMST S2 was first isolated in an oil-contaminated soil sample from Kedah, Malaysia. Following enrichment in LB broth, this strain was acclimatized in M9 minimal medium supplemented with diesel (max. 1% v/v) and chlorpyrifos (max. 100 mg/L) in increasing concentrations, as the sole carbon source. Genomic DNA extraction was performed according to the GeneJet Genomic DNA purification kit’s protocol using a log-phase culture grown in Luria broth. The concentration and quality of extracted DNA was determined using Nanodrop, Qubit dsDNA BR assay and a 1% (v/w) agarose gel. The genomic DNA was then subjected to sequencing via Illumina HiSeq. 2500 and Oxford Nanopore. DNA sequencing was performed with both Illumina and Nanopore technologies as they yield short (~150 bases) and long reads (~10,000 bases), respectively, a combination of which has shown to improve hybrid genome assembly quality by providing accurate, complete genomes without gaps14.

Table 1 General features of S. paucimobilis strain AIMST S2 based on MIGS mandatory information.

The complete genome sequence reported in this study will be useful for analysis of protein-coding gene families, identification of genomic islands, repeat regions, prophages, and structural rearrangements. Apart from that, the data from this study can be utilized for comparative genome analysis of strains belonging to the genus Sphingomonas and other xenobiotic-degrading microorganisms, as well as transcriptome studies of chlorpyrifos biodegradation.

An overview of the experimental design of the study is illustrated in Fig. 1 and a detailed account of the workflow is provided in the methodology.

Fig. 1
figure 1

Overview of the experimental design of study.

Methods

Bacterial growth and genomic DNA extraction

S. paucimobilis was cultivated in LB broth and incubated at 37 °C until it attained an absorbance of ~0.7 at 600 nm. The log-phase culture was centrifuged at 10,000 × g for 10 minutes and the cell pellet was subjected to genomic DNA extraction according to the GeneJet Genomic DNA purification kit’s protocol (Thermo Fisher Scientific, Waltham, MA, USA). The concentration and quality of extracted DNA was determined using Nanodrop ™ Lite spectrophotometer (Thermo Scientific, Wilmington, DE, USA), Qubit dsDNA BR assay (Thermo Scientific, Wilmington, DE, USA) and 1% (v/w) agarose gel electrophoresis. The genomic DNA was then subjected to sequencing via Illumina HiSeq. 2500 and MinION.

Illumina Sequencing

DNA was fragmented using Covaris to a targeted size of 350 bp and upon adapter ligation, a library containing fragments of 470 bp was generated. The library size was determined using Bioanalyzer high sensitivity DNA chip (Agilent, CA, USA). Library was prepared using NEBNext Ultra DNA Library Prep Kit for Illumina (NEB, MA, USA) and paired-end sequenced.

Oxford Nanopore MinION Sequencing

Approximately 500 ng genomic DNA was used to build a DNA library using a Rapid Sequencing Kit (SQK-RAD004) (ONT, Oxford, UK) as described by the manufacturer. MinKNOW software version 2.0 (ONT, Oxford, UK) was used to perform a quality check on the flow cell before the DNA library was loaded. Sequencing was performed on MK1B (MIN-101B) MinION platform with a FLO-MIN 106 R9.4 (SpotON) flow cell according to the manufacturers’ instructions. Raw sequence reads were basecalled real time using MinKNOW, producing Fastq format data.

Hybrid genome assembly

The FastQ format data obtained from Illumina and MinION sequencing was subjected to genome assembly using Unicycler version 0.4.3 with default parameters.

Genome annotation

The assembly was annotated with Prokka15. Genome-wide COG functional annotation was performed using eggNOG mapper with DIAMOND mapping mode, which is available in version 4.5.116,17. Following this, the amino acid sequences were subjected to KEGG analysis via KAAS for pathway mapping. Prophages and genomic islands were also identified using PHASTER18 and IslandViewer 419.

Data Records

Sequencing raw reads obtained from Illumina and Nanopore MinION runs have been deposited in the NCBI Sequence Read Archive under SRP185601 (accessible at https://identifiers.org/ncbi/insdc.sra:SRP185601)20. All predicted genes and their functional annotations are provided in GenBank (Accession number: NZ_CP035765)21. The circular genome assembly for S. paucimobilis has been deposited in NCBI Assembly under GCA_003314795.222, and the whole project is at BioProject under PRJNA478628 (https://identifiers.org/bioproject:PRJNA478628).

Technical Validation

FaQCs was used to obtain the sequencing statistics and Q scores of Illumina short-reads, while Pauvr was used to obtain the same for MinION sequencing (Table 2). Illumina sequencing yielded paired-end reads of ~150 bases with more than 98% reads possessing Phred scores (Q scores) above 20 (Fig. 2a), when quality screening was performed with FaQCs. MinION reads were also of high quality, as shown in Fig. 2b.

Table 2 Basic statistics of Illumina and MinION sequencing.
Fig. 2
figure 2

Phred analysis of Illumina and MinION reads for the Sphingomonas paucimobilis AIMST S2 strain genome. (a) Q scores for Illumina reads. (b) Q scores for MinION reads.

The hybrid genome assembly performed with the reads provided a complete, circular genome of S. paucimobilis, containing 4,005,505 bases, with an overall GC content of 65.73%. The sequencing coverage based on raw reads was 446.6×. A total of 3,612 coding sequences (CDS), 56 tRNAs, 1 tmRNA and 1 CRISPR array were identified. Three identical ribosomal operons were identified.

Figure 3 illustrates the circular genome of S. paucimobilis plotted using CGView23.

Fig. 3
figure 3

Circular map of S. paucimobilis. (a) Circular representation of genome with basic features including CDS and tRNA distributions. (b) Circular representation of genome based on COG classification.

Several levels of validation were performed to refine the hybrid assembly and check for completeness and the quality of genes predicted. Pilon refines the assembly using short reads during the final stage of assembly in Unicycler, by detecting and correcting single base differences, small and large indels or block substitution events. The present hybrid assembly was polished twice by Pilon with no changes in the assembly, suggesting an accurate assembly.

The completeness of the genomic data was further assessed according to Watson and Warr (2019)24. A DIAMOND blast against the UniProt TREMBL database showed that 99.1% of the genes predicted in the genome had more than 90% coverage to its top hit, suggesting good quality assembly and annotation was generated.

Among these, approximately 32 genes were shown to be involved in xenobiotic degradation (Table 3).

Table 3 Gene clusters involved in xenobiotic degradation.

Interestingly, one of the key genes responsible for organophosphate biodegradation, glutathione S-transferase, gst was identified in the analysis. gst has previously been said to detoxify xenobiotics by catalyzing the nucleophilic conjugation of reduced tripeptide glutathione (GSH; γ-Glu-Cys-Gly) into hydrophobic and electrophilic substrates25,26.

Apart from genes involved in chlorpyrifos and other xenobiotic biodegradation, several genes related to plant-growth promoting factors were also identified in the genome. This includes several genes in auxin biosynthesis, alkaloid biosynthesis and nitrogen metabolism. Auxin plays a significant role in promoting stem elongation27,28, while alkaloid plays an important role in plants by preventing insects from eating them29. Genes involved in nitrogen metabolism like nitrate reductase, on the other hand, is responsible in reducing nitrate to nitrite for the production of protein in most crop plants, as nitrate is the predominant source of nitrogen in fertilized soils30,31,32.

Characterization of the complete genome of S. paucimobilis, identification of potential chlorpyrifos-degrading gene, gst and an array of genes coding for plant-growth promoting factors opens an avenue to more studies on bioremediation and its’ potential use as an effective microorganism in bioremediation and agriculture.