Elsevier

Marine Genomics

Volume 45, June 2019, Pages 64-71
Marine Genomics

Two reads to rule them all: Nanopore long read-guided assembly of the iconic Christmas Island red crab, Gecarcoidea natalis (Pocock, 1888), mitochondrial genome and the challenges of AT-rich mitogenomes

https://doi.org/10.1016/j.margen.2019.02.002Get rights and content

Highlights

  • We report the first circularized and near-complete mitogenome of the iconic Christmas Island Red Crab (Gecarcoidea natalis).

  • Oxford Nanopore long reads greatly facilitated the assembly of the mitogenome.

  • Illumina sequencing bias against high AT-regions in the mitogenome led to gaps in assembly.

  • Supplementing the hybrid assembly with previously generated transcriptome dataset further improved the mitogenome completeness.

  • An updated mitogenomic phylogeny of Eubrachyura was reported.

Abstract

Despite recent advances in sequencing technology, a complete mitogenome assembly is still unavailable for the gecarcinid land crabs that include the iconic Christmas Island red crab (Gecarcoidea natalis) which is known for its high population density, annual mass breeding migration and ecological significance in maintaining rainforest structure. Using sequences generated from Nanopore and Illumina platforms, we assembled the complete mitogenome for G. natalis, the first for the genus and only second for the family Gecarcinidae. Nine Nanopore long reads representing 0.15% of the sequencing output from an overnight MinION Nanopore run were aligned to the mitogenome. Two of them were >10 kb and combined are sufficient to span the entire G. natalis mitogenome. The use of Illumina genome skimming data only resulted in a fragmented assembly that can be attributed to low to zero sequencing coverage in multiple high AT-regions including the mitochondrial protein-coding genes (NAD4 and NAD5), 16S ribosomal rRNA and non-coding control region. Supplementing the mitogenome assembly with previously acquired transcriptome dataset containing high abundance of mitochondrial transcripts improved mitogenome sequence coverage and assembly reliability. We then inferred the phylogeny of the Eubrachyura using Maximum Likelihood and Bayesian approaches, confirming the phylogenetic placement of G. natalis within the family Gecarcinidae based on whole mitogenome alignment. Given the substantial impact of AT-content on mitogenome assembly and the value of complete mitogenomes in phylogenetic and comparative studies, we recommend that future mitogenome sequencing projects consider generating a modest amount of Nanopore long reads to facilitate the closing of problematic and fragmented mitogenome assemblies.

Introduction

The Christmas Island red crab, Gecarcoidea natalis, is one of the 3 currently recognized members of the genus Gecarcoidea in the family Gecarcinidae that consists of tropical air-breathing land crabs (Hartnoll and Gould, 1988; Lai et al., 2017). The species is endemic to Christmas Island and undergoes a spectacular breeding migration. At the start of the wet season (early November), almost all the adult population of many millions of individuals migrate to the rainforest boundary adjacent to the sea (Adamczewska and Morris, 2001). There, mating takes place and females broadcast their fertilized eggs into the sea at the shoreline (Adamczewska and Morris, 2001). Gecarcoidea natalis is a detritivore, consuming mainly leaf litter along with some fruits seeds and animal material (Linton and Greenaway, 2007). It is a keystone species that controls the floristic composition of the rainforest through the selective consumption of seedlings (Green et al., 2008; O'Dowd and Lake, 1989). The population of G. natalis on Christmas Island is extremely large (0.57–1.3 crabs/m2) and genetically homogeneous (Green, 1997; Weeks et al., 2014; Sherman, 2003). It is however potentially under threat from the yellow crazy ant, Anoplolepis gracilipes (O'Dowd and Lake, 1989). This has led to the disappearance of G. natalis from substantial swathes of rainforest and has changed its botanical composition as a result (O'Dowd and Lake, 1989). Given that there is only one known population of G. natalis in the world, any molecular resources that can be generated will be essential to the genetic management, preservation and conservation of this iconic crab species.

The currently available mitochondrial resources for the Christmas Island red crab have been used in two major and distinct phylogenetic studies (Lai et al., 2017; Tsang et al., 2014). The mitochondrial 12S and 16S rRNA gene fragments of G. natalis along with 8 nuclear protein-coding genes were sequenced and used to construct the largest crab phylogeny published to date. This study placed G. natalis within the Gecarcinidae clade forming a highly supported monophyletic clade with G. lalandii (Tsang et al., 2014). More recently, the systematics of the genus Gecarcoidea was reinvestigated based on morphological features and COX1 gene sequences from G. natalis, G. lalandii and G. humei (Lai et al., 2017). Despite the availability of next-generation sequencing pipeline and bioinformatics software specifically designed to recover, assemble and annotate whole metazoan mitogenomes (Gan et al., 2014; Tan et al., 2015, 2018a; Hahn et al., 2013; Dierckxsens et al., 2016; Bernt et al., 2013), it is surprising and unfortunate that the whole mitogenome of this iconic red crab and more broadly the genus Gecarcoidea is still unavailable. As a result, the Christmas Island red crab was not included in a recent comprehensive whole-mitogenome-based phylogenetic study of crabs focusing on elucidating evolutionary relationships and mitochondrial gene order arrangement scenario among anomurans and brachyurans (Tan et al., 2018a). It is only very recently that the transcriptomic dataset for G. natalis was generated to identify a number of novel carbohydrate-active enzymes (CAZys) (Gan et al., 2018a). Recently, transcriptome sequencing has also been mined for mitochondrial gene transcripts since they are generally highly expressed in the cell (Musacchia et al., 2017; Wang et al., 2017).

Traditionally, the long-range PCR approach has been used to sequence metazoan mitogenomes (Miller et al., 2004, 2005). While somewhat costly and time consuming, if the usual mitochondrial gene content and expected gene organization are present, it allows routine assemblies of high quality mitogenome. However, this approach is gradually being superseded by the more cost-effective and efficient low-coverage shotgun sequencing (genome skimming), which has the advantage that it does not require a prior knowledge of gene organization to reliably reconstruct the mitogenome. This approach, commonly using short reads (<300 bp) from the Illumina platforms, is fast and inexpensive. However, Illumina PCR-amplification is biased against genomic region with high-AT content that is common in the non-coding control region of various mitogenomes (Aird et al., 2011) thus leading to low sequencing depth in those regions. Furthermore, the presence of large tandem duplicated and highly repetitive regions (Aird et al., 2011) may further complicate mitogenome assembly (Velozo Timbó et al., 2017) thus requiring downstream PCR validation. However, given the increasing number of complete mitogenomes assembled using Illumina-only dataset, it appears that the contiguity of most Illumina-based mitogenome assemblies are not significantly affected by the uneven sequencing coverage presumably thanks to the high abundance of mitochondrial-derived reads and/or high sequencing coverage (Tan et al., 2015, 2018a; Harrisson et al., 2016; Gan et al., 2018b; Lopes-Lima et al., 2018; Tan et al., 2017).

Recent advances in Oxford Nanopore sequencing technology have democratized long-read DNA sequencing, allowing smaller labs to perform long-read sequencing without substantial capital investment (Pennisi, 2014). This coincides with the increase in studies using long-read sequencing data to improve and/or close microbial and metazoan organelle genomes (Gan et al., 2012, 2017, 2018c; Austin et al., 2017; Tan et al., 2018b). Due to the lower read accuracy of Nanopore relative to Illumina, a stand-alone and error-free Nanopore-only assembly is still not feasible as it would require high coverage to generate a high-quality consensus with still the potential for indel errors (Loman et al., 2015). The strength of Nanopore sequencing lies in its ability to read extremely long strands of DNA without being affected by the nucleotide composition. To date, Nanopore read as long as 2.2 Mb (reported in 11 consecutive reads) has been generated for a human genome (Payne et al., 2018). Long reads preserve structural information (gene order) that increases with read length and can be used as the anchors for shorter reads such as those from the Illumina platform. The chloroplast genome of Eucalyptus pauciflora which consists of two large inverted repeats has been successfully assembled using this hybrid approach due to the recovery of multiple 30–50 kb long Nanopore reads that can fully span the chloroplast repeats (Schalamun et al., 2018). To our knowledge, the rodent nematode parasite (Nippostrongylus brasiliensis) mitogenome is the first reported mitogenome that was assembled using Nanopore long reads followed by the green-lipped mussel (Perna canaliculus) and more recently the bagworm moth (Eumeta variegate) (Ranjard et al., 2018; Arakawa et al., 2018; Chandler et al., 2017). However, the use of low-coverage Nanopore long reads to complement and improve challenging metazoan mitogenome assembly remains uncommon.

In this study, we attempted to assemble the G. natalis mitogenome using our standard Illumina sequencing pipeline (Gan et al., 2014) but failed to recover the mitogenome as a circular contig. By supplementing this initial data with Nanopore long reads and previously generated transcriptome data (Gan et al., 2018a), we were able to generate the first circularized and near-complete mitogenome of G. natalis, providing evidence that Illumina sequencing bias can result in mitogenome assembly gaps. We then produced an up-to-date Eubrachyura phylogeny representing a 32% increase in taxon sampling at the species level from the most recent study (Tan et al., 2018a) and confirmed the phylogenetic placement of G. natalis within the family Gecarcinidae based on whole mitogenome data.

Section snippets

Approvals and permits

Permits required and issued for this work on Gecarcoidea natalis were as per Gan et al. (2018a).

Nucleic acid extraction and mitogenome sequencing

Christmas Island red crabs, Gecarcoidea natalis were collected from the rainforest on Christmas Island, Indian Ocean, Australia. Immediately after euthanization, the crab tissues (midgut gland, gill and muscle) were removed and preserved in RNAlater (Thermofisher Scientific) as previously described (Linton et al., 2015). Samples were brought back by airfreight to the laboratory at Deakin University,

The complete Christmas Island red crab mitogenome can be sufficiently spanned by two Nanopore reads

The mitogenome of G. natalis (Fig. 1A) was successfully assembled de novo into a circular contig as evidenced by the presence of a circular assembly graph in Bandage v0.8.1 (Wick et al., 2015) (data not shown). The circularized mitogenome is 15,553 bp with GC content of 24.78% (Genbank accession: MH816962). It consists of the 13 PCGs, 2 rRNAs and 22 tRNAs and exhibits the typical brachyuran mitochondrial gene arrangement that deviates slightly from the pancrustacean gene order (Fig. 1B)

Discussion

Using reads generated from Illumina short read and Nanopore long read technologies, we report the near-complete and circular mitochondrial genome for the iconic Christmas Island red crab, the second for the family Gecarcinidae and the first for the genus Gecarcoidea which currently consists of three recognized species, G. natalis, G. lalandi and G. humei (Lai et al., 2017). The Christmas Island red crab mitogenome is one of the few problematic decapods crustacean mitogenomes that could not be

Conclusions

We report the circularized and near-complete mitochondrial genome of the iconic Christmas Island red crab and inferred its placement within the Eubrachyura phylomitogenomic tree. We identified high AT content as the culprit for mitogenome assembly gaps and recommend that future mitogenome sequencing of the genus Gecarcoidea notably the diverse and broadly distributed G. lalandii to use Illumina library preparation protocol optimized for high AT content genome. These findings also have

Data availability

Raw Illumina fastq reads from partial genome sequencing are available under the SRA accession number PRJNA492826. Nanopore basecalled fasta reads have been deposited in the Zenodo database (https://doi.org/10.5281/zenodo.1451962).

Declaration of interest

All authors declare that they have no competing interest.

Funding sources

Funding for this study was provided in part by Deakin University and Monash University Malaysia.

References (73)

  • A.M. Adamczewska et al.

    Ecology and behavior of Gecarcoidea natalis, the Christmas Island red crab, during the annual breeding migration

    Biol. Bull.

    (2001)
  • D. Aird et al.

    Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries

    Genome Biol.

    (2011)
  • K. Arakawa et al.

    The complete mitochondrial genome of Eumeta variegata (Lepidoptera: Psychidae)

    Mitochondr. DNA Part B

    (2018)
  • P.M. Ashton et al.

    MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island

    Nat. Biotechnol.

    (2015)
  • C.M. Austin et al.

    The complete mitogenome of the whale shark parasitic copepod Pandarus rhincodonicus Norman, Newbound & Knott (Crustacea; Siphonostomatoida; Pandaridae)–a new gene order for the copepoda

    Mitochondr. DNA Part A

    (2016)
  • C.M. Austin et al.

    De novo genome assembly and annotation of Australia's largest freshwater fish, the Murray cod (Maccullochella peelii), from Illumina and Nanopore sequencing read

    Gigascience

    (2017)
  • A. Basso et al.

    The highly rearranged mitochondrial genomes of the crabs Maja crispata and Maja squinado (Majidae) and gene order evolution in Brachyura

    Sci. Rep.

    (2017)
  • S. Capella-Gutiérrez et al.

    trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses

    Bioinformatics

    (2009)
  • T. Carver et al.

    Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data

    Bioinformatics

    (2011)
  • J. Chandler et al.

    Annotated mitochondrial genome with Nanopore R9 signal for Nippostrongylus brasiliensis

    F1000Research

    (2017)
  • N. Dierckxsens et al.

    NOVOPlasty: de novo assembly of organelle genomes from whole genome data

    Nucleic Acids Res.

    (2016)
  • M.C. Fisher-Reid et al.

    What are the consequences of combining nuclear and mitochondrial data for phylogenetic analysis? Lessons from Plethodon salamanders and 13 other vertebrate clades

    BMC Evol. Biol.

    (2011)
  • H.M. Gan et al.

    Genome sequence of Hydrogenophaga sp. strain PBC, a 4-aminobenzenesulfonate-degrading bacterium

    J. Bacteriol.

    (2012)
  • H.M. Gan et al.

    Integrated shotgun sequencing and bioinformatics pipeline allows ultra-fast mitogenome recovery and confirms substantial gene rearrangements in Australian freshwater crayfishes

    BMC Evol. Biol.

    (2014)
  • H.Y. Gan et al.

    The complete mitogenome of the hermit crab Clibanarius infraspinatus (Hilgendorf, 1869),(Crustacea; Decapoda; Diogenidae)–a new gene order for the Decapoda

    Mitochondr. DNA Part A

    (2016)
  • H.M. Gan et al.

    Nanopore long-read guided complete genome assembly of Hydrogenophaga intermedia, and genomic insights into 4-aminobenzenesulfonate, p-aminobenzoic acid and hydrogen metabolism in the genus Hydrogenophaga

    Front. Microbiol.

    (2017)
  • H.M. Gan et al.

    Transcriptome-guided identification of carbohydrate active enzymes (CAZy) from the Christmas Island red crab, Gecarcoidea natalis and a vote for the inclusion of transcriptome-derived crustacean CAZys in comparative studies

    Mar. Biotechnol.

    (2018)
  • H.M. Gan et al.

    High-quality draft genome sequence of the type strain of Allorhizobium vitis, the primary causal agent of grapevine crown gall

    Microbiol. Res. Announc.

    (2018)
  • L. Gong et al.

    Picky comprehensively detects high-resolution structural variants in nanopore long reads

    Nat. Methods

    (2018)
  • P.T. Green

    Red crabs in rain forest on Christmas Island, Indian Ocean: activity patterns, density and biomass

    J. Trop. Ecol.

    (1997)
  • P.T. Green et al.

    Recruitment dynamics in a rainforest seedling community: context-independent impact of a keystone consumer

    Oecologia

    (2008)
  • Y. Guo et al.

    Transfer RNA detection by small RNA deep sequencing and disease association with myelodysplastic syndromes

    BMC Genomics

    (2015)
  • B.J. Haas et al.

    De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis

    Nat. Protoc.

    (2013)
  • C. Hahn et al.

    Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads—a baiting and iterative mapping approach

    Nucleic Acids Res.

    (2013)
  • K. Harrisson et al.

    Pleistocene divergence across a mountain range and the influence of selection on mitogenome evolution in threatened Australian freshwater cod species

    Heredity

    (2016)
  • R.G. Hartnoll et al.

    Brachyuran Life History Strategies and the Optimization of Egg Production

    (1988)
  • Cited by (11)

    • Improved genomic resources for the black tiger prawn (Penaeus monodon)

      2020, Marine Genomics
      Citation Excerpt :

      The alignment was used to construct a maximum likelihood tree in IQTree v1.6.5 with 1000 ultrafast bootstrap replicates (Hoang et al., 2017) to summarise evolutionary relationships among cox1 gene haplotypes. To assess the influence of insert size and PCR-amplification on sequencing depth, Illumina reads generated from PCR-dependent (NEBNext Ultra DNA and TruSeq DNA) and PCR-free (NuGen Celero DNA-Seq and TruSeq DNA PCR-Free) libraries were aligned with Bowtie2 (Langmead and Salzberg, 2012) to the black tiger prawn reference mitogenome (GenBank Accession Number: NC_002184.1) (Wilson et al., 2000) since many crustacean mitogenomes are prone to PCR bias owing to the presence of two major low-GC regions, namely the 16S rRNA and control region (Gan et al., 2019a; Gan et al., 2019b). The alignment files in BAM format were visualized in Integrative Genomics Viewer (Thorvaldsdóttir et al., 2013).

    View all citing articles on Scopus
    View full text