skip to main content
10.1145/3014812.3014816acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesaus-cswConference Proceedingsconference-collections
research-article

TRA: tandem repeat assembler for next generation sequences

Authors Info & Claims
Published:31 January 2017Publication History

ABSTRACT

Eukaryotic genomes contain high volumes of intronic and intergenic regions in which repetitive sequences are abundant. These repetitive sequences represent challenges in genomic assignment of short read sequences generated through next generation sequencing and are often excluded in analysis losing invaluable genomic information. Here we present a method, known as TRA (Tandem Repeat Assembler), for the assembly of repetitive sequences by constructing contigs directly from paired-end reads. Using an experimentally acquired data set for human chromosome 14, tandem repeats >200 bp were assembled. Alignment of the contigs to the human genome reference (GRCh38) revealed that 84.3% of tandem repetitive regions were correctly covered. For tandem repeats, this method outperformed state-of-the-art assemblers by generating correct N50 of contigs up to 512 bp.

References

  1. A novel gene containing a trinucleotide repeat that is expanded and unstable on huntington's disease chromosomes. the huntington's disease collaborative research group. Cell, 72:971--983, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. R. Boland, S. N. Thibodeau, S. R. Hamilton, D. Sidransky, J. R. Eshleman, R. W. Burt, S. J. Meltzer, M. A. Rodriguez-Bigas, R. Fodde, and G. N. Ranzani. A national cancer institute workshop on microsatellite instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Res, 58:5248--5257, 1998.Google ScholarGoogle Scholar
  3. M. D. Cao, E. Tasker, K. Willadsen, M. Imelfort, S. Vishwanathan, S. Sureshkumar, S. Balasubramanian, and M. Boden. Inferring short tandem repeat variation from paired-end short reads. Nucleic Acids Res, 42:E16, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  4. M. J. Chaisson, D. Brinza, and P. A. Pevzner. De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res, 19:336--346, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  5. W. F. Doolittle and C. Sapienza. Selfish genes, the phenotype paradigm and genome evolution. Nature, 284:604--607, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. Earl, K. Bradnam, J. John, A. Darling, D. Lin, J. Fass, H. O. Yu, V. Buffalo, D. R. Zerbino, and M. Diekhans. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res, 21:2224--2241, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  7. S. El-Metwally, T. Hamza, M. Zakaria, and M. Helmy. Next-generation sequence assembly: four stages of data processing and computational challenges. PLoS Comput Biol, 9:e1003345, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  8. Y. Gelfand, A. Rodriguez, and G. Benson. Trdb-the tandem repeats database. Nucleic Acids Res, 20:265--272, 2007.Google ScholarGoogle Scholar
  9. R. Gemayel, M. D. Vinces, M. Legendre, and K. J. Verstrepen. Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu Rev Genet, 44:445--477, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Gnerre, I. Maccallum, D. Przybylski, F. J. Ribeiro, J. N. Burton, B. J. Walker, T. Sharpe, G. Hall, T. P. Shea, and S. Sykes. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A, 108:1513--1518, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Guttman, I. Amit, M. Garber, C. French, M. F. Lin, D. Feldser, M. Huarte, O. Zuk, B. W. Carey, and J. P. Cassady. Chromatin signature reveals over a thousand highly conserved large non-coding rnas in mammals. Nature, 458:223--227, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. Gymrek, D. Golan, S. Rosset, and Y. Erlish. lobstr: A short tandem repeat profiler for personal genomes. Genome Res, 22:1154--1162, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. J. Hannan. Tandem repeat polymorphisms: modulators of disease susceptibility and candidates for 'missing heritability'. Trends Genet, 26:59--65, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  14. G. Highnam, C. Franck, A. Martin, C. Stephens, A. Puthige, and D. Mittelman. Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles. Nucleic Acids Res, 41:e32, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Koren, T. J. TReangen, and M. Pop. Bambus 2: scaffolding metagenomes. Bioinformatics, 27:2964--2971, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. S. Lander, L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, and W. FitzHugh. Initial sequencing and analysis of the human genome. Nature, 409:860--921, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. R. LaSpada, E. M. Wilson, D. B. Lubahn, A. E. Harding, and K. H. Fischbeck. Androgen receptor gene mutations in x-linked spinal and bulbar muscular atrophy. Nature, 352:77--79, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  18. R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, and K. Kristiansen. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res, 20:265--272, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  19. R. Luo, B. Liu, Y. Xie, Z. Li, W. Huang, J. Yuan, G. He, Y. Chen, Q. Pan, and Y. Liu. Soapdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1:18, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  20. C. T. McMurray. Mechanisms of trinucleotide repeat instability during human development. Nat Rev Genet, 11:786--799, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. L. Metzker. Sequencing technologies - the next generation. Nat Rev Genet, 11:31--46, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. R. Miller, A. L. Delcher, S. Koren, E. Venter, B. P. Walenz, A. Brownley, J. Johnson, K. Li, C. Mobarry, and G. Sutton. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics, 24:2818--2824, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. R. Miller, s. Koren, and G. Sutton. Assembly algorithms for next-generation sequencing data. Genomics, 95:315--327, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  24. L. Mularoni, R. Guigo, and M. M. Alba. Mutation patterns of amino acid tandem repeats in the human proteome. Genome Biol, 7:R33, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  25. L. Noe and G. Kucherov. Yass: enhancing the sensitivity of dna similarity search. Nucleic Acids Res, 33:W540--543, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  26. C. T. O'Dushlaine, R. J. Edwards, S. D. Park, and D. C. Shields. Tandem repeat copy-number variation in protein-coding regions of human genes. Genome Biol, 6:R69, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  27. S. Ohno. So much.Google ScholarGoogle Scholar
  28. L. E. Orgel and F. H. Crick. Selfish dna: the ultimate parasite. Nature, 284:604--607, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  29. M. O. Press, K. D. Carlson, and C. Queitsch. The overdue promise of short tandem repeat variation for heritability. Trends Genet, 30:504--512, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  30. J. L. Rinn and H. Y. Chang. Genome regulation by long noncoding rnas. Annu Rev Biochem, 81:145--166, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  31. R. A. Rollins, F. Haghighi, J. R. Edwards, R. Das, M. Q. Zhang, J. Ju, and T. H. Bestor. Large-scale structure of genomic methylation patterns. Genome Res, 16:157--163, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  32. S. L. Salzberg, A. M. Phillippy, A. Zimin, D. Puiu, T. Magoc, S. Koren, T. J. Treangen, M. C. Schatz, A. L. Delcher, and M. Roberts. Gage: A critical evaluation of genome assemblies and assembly algorithms. Genome Res, 22:557--567, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  33. J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. Jones, and I. Birol. Abyss: a parallel assembler for short read sequence data. Genome Res, 19:1117--1123, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  34. A. F. Smit. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev, 6:743--748, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  35. S. Subramanian, R. K. Mishra, and L. Singh. Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions. Genome Biol, 4:R13, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  36. T. J. Treangen and S. L. Salzberg. Repetitive dna and next-generation sequencing: computational challenges and solutions. Nat Rev Genet, 13:36--46, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  37. P. S. Walsh, N. J. Fildes, and R. Reynolds. Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vwa. Nucleic Acids Res, 24:2807--2812, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  38. D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Res, 18:821--829, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  1. TRA: tandem repeat assembler for next generation sequences

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference
      January 2017
      615 pages
      ISBN:9781450347686
      DOI:10.1145/3014812

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 January 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ACSW '17 Paper Acceptance Rate78of156submissions,50%Overall Acceptance Rate204of424submissions,48%
    • Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader