Two reads to rule them all: Nanopore long read-guided assembly of the iconic Christmas Island red crab, Gecarcoidea natalis (Pocock, 1888), mitochondrial genome and the challenges of AT-rich mitogenomes
Introduction
The Christmas Island red crab, Gecarcoidea natalis, is one of the 3 currently recognized members of the genus Gecarcoidea in the family Gecarcinidae that consists of tropical air-breathing land crabs (Hartnoll and Gould, 1988; Lai et al., 2017). The species is endemic to Christmas Island and undergoes a spectacular breeding migration. At the start of the wet season (early November), almost all the adult population of many millions of individuals migrate to the rainforest boundary adjacent to the sea (Adamczewska and Morris, 2001). There, mating takes place and females broadcast their fertilized eggs into the sea at the shoreline (Adamczewska and Morris, 2001). Gecarcoidea natalis is a detritivore, consuming mainly leaf litter along with some fruits seeds and animal material (Linton and Greenaway, 2007). It is a keystone species that controls the floristic composition of the rainforest through the selective consumption of seedlings (Green et al., 2008; O'Dowd and Lake, 1989). The population of G. natalis on Christmas Island is extremely large (0.57–1.3 crabs/m2) and genetically homogeneous (Green, 1997; Weeks et al., 2014; Sherman, 2003). It is however potentially under threat from the yellow crazy ant, Anoplolepis gracilipes (O'Dowd and Lake, 1989). This has led to the disappearance of G. natalis from substantial swathes of rainforest and has changed its botanical composition as a result (O'Dowd and Lake, 1989). Given that there is only one known population of G. natalis in the world, any molecular resources that can be generated will be essential to the genetic management, preservation and conservation of this iconic crab species.
The currently available mitochondrial resources for the Christmas Island red crab have been used in two major and distinct phylogenetic studies (Lai et al., 2017; Tsang et al., 2014). The mitochondrial 12S and 16S rRNA gene fragments of G. natalis along with 8 nuclear protein-coding genes were sequenced and used to construct the largest crab phylogeny published to date. This study placed G. natalis within the Gecarcinidae clade forming a highly supported monophyletic clade with G. lalandii (Tsang et al., 2014). More recently, the systematics of the genus Gecarcoidea was reinvestigated based on morphological features and COX1 gene sequences from G. natalis, G. lalandii and G. humei (Lai et al., 2017). Despite the availability of next-generation sequencing pipeline and bioinformatics software specifically designed to recover, assemble and annotate whole metazoan mitogenomes (Gan et al., 2014; Tan et al., 2015, 2018a; Hahn et al., 2013; Dierckxsens et al., 2016; Bernt et al., 2013), it is surprising and unfortunate that the whole mitogenome of this iconic red crab and more broadly the genus Gecarcoidea is still unavailable. As a result, the Christmas Island red crab was not included in a recent comprehensive whole-mitogenome-based phylogenetic study of crabs focusing on elucidating evolutionary relationships and mitochondrial gene order arrangement scenario among anomurans and brachyurans (Tan et al., 2018a). It is only very recently that the transcriptomic dataset for G. natalis was generated to identify a number of novel carbohydrate-active enzymes (CAZys) (Gan et al., 2018a). Recently, transcriptome sequencing has also been mined for mitochondrial gene transcripts since they are generally highly expressed in the cell (Musacchia et al., 2017; Wang et al., 2017).
Traditionally, the long-range PCR approach has been used to sequence metazoan mitogenomes (Miller et al., 2004, 2005). While somewhat costly and time consuming, if the usual mitochondrial gene content and expected gene organization are present, it allows routine assemblies of high quality mitogenome. However, this approach is gradually being superseded by the more cost-effective and efficient low-coverage shotgun sequencing (genome skimming), which has the advantage that it does not require a prior knowledge of gene organization to reliably reconstruct the mitogenome. This approach, commonly using short reads (<300 bp) from the Illumina platforms, is fast and inexpensive. However, Illumina PCR-amplification is biased against genomic region with high-AT content that is common in the non-coding control region of various mitogenomes (Aird et al., 2011) thus leading to low sequencing depth in those regions. Furthermore, the presence of large tandem duplicated and highly repetitive regions (Aird et al., 2011) may further complicate mitogenome assembly (Velozo Timbó et al., 2017) thus requiring downstream PCR validation. However, given the increasing number of complete mitogenomes assembled using Illumina-only dataset, it appears that the contiguity of most Illumina-based mitogenome assemblies are not significantly affected by the uneven sequencing coverage presumably thanks to the high abundance of mitochondrial-derived reads and/or high sequencing coverage (Tan et al., 2015, 2018a; Harrisson et al., 2016; Gan et al., 2018b; Lopes-Lima et al., 2018; Tan et al., 2017).
Recent advances in Oxford Nanopore sequencing technology have democratized long-read DNA sequencing, allowing smaller labs to perform long-read sequencing without substantial capital investment (Pennisi, 2014). This coincides with the increase in studies using long-read sequencing data to improve and/or close microbial and metazoan organelle genomes (Gan et al., 2012, 2017, 2018c; Austin et al., 2017; Tan et al., 2018b). Due to the lower read accuracy of Nanopore relative to Illumina, a stand-alone and error-free Nanopore-only assembly is still not feasible as it would require high coverage to generate a high-quality consensus with still the potential for indel errors (Loman et al., 2015). The strength of Nanopore sequencing lies in its ability to read extremely long strands of DNA without being affected by the nucleotide composition. To date, Nanopore read as long as 2.2 Mb (reported in 11 consecutive reads) has been generated for a human genome (Payne et al., 2018). Long reads preserve structural information (gene order) that increases with read length and can be used as the anchors for shorter reads such as those from the Illumina platform. The chloroplast genome of Eucalyptus pauciflora which consists of two large inverted repeats has been successfully assembled using this hybrid approach due to the recovery of multiple 30–50 kb long Nanopore reads that can fully span the chloroplast repeats (Schalamun et al., 2018). To our knowledge, the rodent nematode parasite (Nippostrongylus brasiliensis) mitogenome is the first reported mitogenome that was assembled using Nanopore long reads followed by the green-lipped mussel (Perna canaliculus) and more recently the bagworm moth (Eumeta variegate) (Ranjard et al., 2018; Arakawa et al., 2018; Chandler et al., 2017). However, the use of low-coverage Nanopore long reads to complement and improve challenging metazoan mitogenome assembly remains uncommon.
In this study, we attempted to assemble the G. natalis mitogenome using our standard Illumina sequencing pipeline (Gan et al., 2014) but failed to recover the mitogenome as a circular contig. By supplementing this initial data with Nanopore long reads and previously generated transcriptome data (Gan et al., 2018a), we were able to generate the first circularized and near-complete mitogenome of G. natalis, providing evidence that Illumina sequencing bias can result in mitogenome assembly gaps. We then produced an up-to-date Eubrachyura phylogeny representing a 32% increase in taxon sampling at the species level from the most recent study (Tan et al., 2018a) and confirmed the phylogenetic placement of G. natalis within the family Gecarcinidae based on whole mitogenome data.
Section snippets
Approvals and permits
Permits required and issued for this work on Gecarcoidea natalis were as per Gan et al. (2018a).
Nucleic acid extraction and mitogenome sequencing
Christmas Island red crabs, Gecarcoidea natalis were collected from the rainforest on Christmas Island, Indian Ocean, Australia. Immediately after euthanization, the crab tissues (midgut gland, gill and muscle) were removed and preserved in RNAlater (Thermofisher Scientific) as previously described (Linton et al., 2015). Samples were brought back by airfreight to the laboratory at Deakin University,
The complete Christmas Island red crab mitogenome can be sufficiently spanned by two Nanopore reads
The mitogenome of G. natalis (Fig. 1A) was successfully assembled de novo into a circular contig as evidenced by the presence of a circular assembly graph in Bandage v0.8.1 (Wick et al., 2015) (data not shown). The circularized mitogenome is 15,553 bp with GC content of 24.78% (Genbank accession: MH816962). It consists of the 13 PCGs, 2 rRNAs and 22 tRNAs and exhibits the typical brachyuran mitochondrial gene arrangement that deviates slightly from the pancrustacean gene order (Fig. 1B)
Discussion
Using reads generated from Illumina short read and Nanopore long read technologies, we report the near-complete and circular mitochondrial genome for the iconic Christmas Island red crab, the second for the family Gecarcinidae and the first for the genus Gecarcoidea which currently consists of three recognized species, G. natalis, G. lalandi and G. humei (Lai et al., 2017). The Christmas Island red crab mitogenome is one of the few problematic decapods crustacean mitogenomes that could not be
Conclusions
We report the circularized and near-complete mitochondrial genome of the iconic Christmas Island red crab and inferred its placement within the Eubrachyura phylomitogenomic tree. We identified high AT content as the culprit for mitogenome assembly gaps and recommend that future mitogenome sequencing of the genus Gecarcoidea notably the diverse and broadly distributed G. lalandii to use Illumina library preparation protocol optimized for high AT content genome. These findings also have
Data availability
Raw Illumina fastq reads from partial genome sequencing are available under the SRA accession number PRJNA492826. Nanopore basecalled fasta reads have been deposited in the Zenodo database (https://doi.org/10.5281/zenodo.1451962).
Declaration of interest
All authors declare that they have no competing interest.
Funding sources
Funding for this study was provided in part by Deakin University and Monash University Malaysia.
References (73)
- et al.
MITOS: improved de novo metazoan mitochondrial genome annotation
Mol. Phylogenet. Evol.
(2013) - et al.
More evolution underground: accelerated mitochondrial substitution rate in Australian burrowing freshwater crayfishes (Decapoda: Parastacidae)
Mol. Phylogenet. Evol.
(2018) Minimap2: pairwise alignment for nucleotide sequences
Bioinformatics
(2018)- et al.
A glycosyl hydrolase family 16 gene is responsible for the endogenous production of beta-1,3-glucanases within decapod crustaceans
Gene
(2015) - et al.
Expansion and systematics redefinition of the most threatened freshwater mussel family, the Margaritiferidae
Mol. Phylogenet. Evol.
(2018) - et al.
Complete mitochondrial DNA sequence of the Australian freshwater crayfish, Cherax destructor (Crustacea: Decapoda: Parastacidae): a novel gene order revealed
Gene
(2004) - et al.
MitoPhAST, a new automated mitogenomic phylogeny tool in the post-genomic era with a case study of 89 decapod mitogenomes including eight new freshwater crayfish mitogenomes
Mol. Phylogenet. Evol.
(2015) - et al.
ORDER within the chaos: insights into phylogenetic relationships within the Anomura (Crustacea: Decapoda) from mitochondrial sequences and gene order rearrangements
Mol. Phylogenet. Evol.
(2018) - et al.
TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations
Nucleic Acids Res.
(2010) - et al.
ExaBayes: massively parallel Bayesian tree inference for the whole-genome era
Mol. Biol. Evol.
(2014)
Ecology and behavior of Gecarcoidea natalis, the Christmas Island red crab, during the annual breeding migration
Biol. Bull.
Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries
Genome Biol.
The complete mitochondrial genome of Eumeta variegata (Lepidoptera: Psychidae)
Mitochondr. DNA Part B
MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island
Nat. Biotechnol.
The complete mitogenome of the whale shark parasitic copepod Pandarus rhincodonicus Norman, Newbound & Knott (Crustacea; Siphonostomatoida; Pandaridae)–a new gene order for the copepoda
Mitochondr. DNA Part A
De novo genome assembly and annotation of Australia's largest freshwater fish, the Murray cod (Maccullochella peelii), from Illumina and Nanopore sequencing read
Gigascience
The highly rearranged mitochondrial genomes of the crabs Maja crispata and Maja squinado (Majidae) and gene order evolution in Brachyura
Sci. Rep.
trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses
Bioinformatics
Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data
Bioinformatics
Annotated mitochondrial genome with Nanopore R9 signal for Nippostrongylus brasiliensis
F1000Research
NOVOPlasty: de novo assembly of organelle genomes from whole genome data
Nucleic Acids Res.
What are the consequences of combining nuclear and mitochondrial data for phylogenetic analysis? Lessons from Plethodon salamanders and 13 other vertebrate clades
BMC Evol. Biol.
Genome sequence of Hydrogenophaga sp. strain PBC, a 4-aminobenzenesulfonate-degrading bacterium
J. Bacteriol.
Integrated shotgun sequencing and bioinformatics pipeline allows ultra-fast mitogenome recovery and confirms substantial gene rearrangements in Australian freshwater crayfishes
BMC Evol. Biol.
The complete mitogenome of the hermit crab Clibanarius infraspinatus (Hilgendorf, 1869),(Crustacea; Decapoda; Diogenidae)–a new gene order for the Decapoda
Mitochondr. DNA Part A
Nanopore long-read guided complete genome assembly of Hydrogenophaga intermedia, and genomic insights into 4-aminobenzenesulfonate, p-aminobenzoic acid and hydrogen metabolism in the genus Hydrogenophaga
Front. Microbiol.
Transcriptome-guided identification of carbohydrate active enzymes (CAZy) from the Christmas Island red crab, Gecarcoidea natalis and a vote for the inclusion of transcriptome-derived crustacean CAZys in comparative studies
Mar. Biotechnol.
High-quality draft genome sequence of the type strain of Allorhizobium vitis, the primary causal agent of grapevine crown gall
Microbiol. Res. Announc.
Picky comprehensively detects high-resolution structural variants in nanopore long reads
Nat. Methods
Red crabs in rain forest on Christmas Island, Indian Ocean: activity patterns, density and biomass
J. Trop. Ecol.
Recruitment dynamics in a rainforest seedling community: context-independent impact of a keystone consumer
Oecologia
Transfer RNA detection by small RNA deep sequencing and disease association with myelodysplastic syndromes
BMC Genomics
De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis
Nat. Protoc.
Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads—a baiting and iterative mapping approach
Nucleic Acids Res.
Pleistocene divergence across a mountain range and the influence of selection on mitogenome evolution in threatened Australian freshwater cod species
Heredity
Brachyuran Life History Strategies and the Optimization of Egg Production
Cited by (11)
Improved genomic resources for the black tiger prawn (Penaeus monodon)
2020, Marine GenomicsCitation Excerpt :The alignment was used to construct a maximum likelihood tree in IQTree v1.6.5 with 1000 ultrafast bootstrap replicates (Hoang et al., 2017) to summarise evolutionary relationships among cox1 gene haplotypes. To assess the influence of insert size and PCR-amplification on sequencing depth, Illumina reads generated from PCR-dependent (NEBNext Ultra DNA and TruSeq DNA) and PCR-free (NuGen Celero DNA-Seq and TruSeq DNA PCR-Free) libraries were aligned with Bowtie2 (Langmead and Salzberg, 2012) to the black tiger prawn reference mitogenome (GenBank Accession Number: NC_002184.1) (Wilson et al., 2000) since many crustacean mitogenomes are prone to PCR bias owing to the presence of two major low-GC regions, namely the 16S rRNA and control region (Gan et al., 2019a; Gan et al., 2019b). The alignment files in BAM format were visualized in Integrative Genomics Viewer (Thorvaldsdóttir et al., 2013).
The complete mitochondrial genome and phylogenetic analysis of Polythlipta liquidalis Leech, 1889 (Crambidae: spilomelinae)
2023, Mitochondrial DNA Part B: Resources